A comparison of cross-correlation-based and phase-correlation-based image registration algorithms for optical coherence tomographic angiography

Yurui Pu; Chaoliang Chen

doi:10.3788/COL202422.071101

1. Introduction

Optical coherence tomography (OCT) is a near-infrared imaging technology that was proposed in the 1990s^[1] and is now widely used in ophthalmology^[2–5], dentistry^[6–8], gastroenterology^[9,10], and so on. One of its advantages is the ability to non-invasively achieve high-resolution imaging of biological tissues. At present, the typical lateral scanning range of OCT is 2–6 mm^[11], which is too small for most clinical scenarios, even for ophthalmology and endoscopy, not to mention intraoperative brain or kidney imaging^[12]. There are two ways to improve the limited imaging range: one is to increase the optical scanning range of the sample arm^[13,14], which presents challenges for optical design or manufacturing, and the other is to perform registration algorithms to merge multiple small-range images into a single large-scale image, which is more feasible compared to the previous one.

With the rapid development of OCT, there is also a soaring demand for advanced image processing and analysis techniques. The application of image registration technology in OCT serves three main purposes: noise suppression, multimodal image registration, and obtaining larger field images. For noise suppression, Liu et al.^[15] proposed an image registration algorithm using a regularized dynamic programming method, which corrected both axial and transverse motion offsets, effectively suppressing speckle noise and improving image quality. Wei et al.^[16] introduced a 3D registration method for retinal OCT volume, which applied a non-rigid registration method that combined normalized mutual information-based registration with landmark-based coherent point drift (CPD) registration. For multimodal image registration, Zang et al.^[17] proposed a feature-based registration method, which identified a set of control points (CP) and computed the corresponding feature vectors to identify the best CP match to register fundus photographs to spectral domian optical coherence tomography (SD-OCT) projection images.

In light of advancements in computer technology and image processing techniques, grayscale features of images have remained a prominent research topic for registration algorithms^[18], in which normalized cross-correlation (NCC) is the most common and widely used^[19]. However, the NCC-based image registration method is not capable of handling rotation, and one of the image registration algorithms for addressing both translation and rotation is the Fourier–Mellin transform (FMT)^[20]. However, the FMT algorithm needs a high overlap rate between images to collect enough features for proper image alignment, which degrades patient scanning efficiency in clinical applications. In this work, we performed cross-correlation (instead of phase-correlation) in the workflow of the FMT method (called dual-cross-correlation-based translation and rotation registration method, DCCTRR) to address both translation and rotation mismatches between two optical coherence tomography angiography (OCTA) images. Both phantom and in vivo experiments were implemented for the performance comparisons. The results quantitatively demonstrate that the DCCTRR can align OCTA images with a lower overlap rate compared to the FMT method, which could improve the scanning efficiency of large-scale imaging in clinical applications.

2. Method

The workflow of the DCCTRR method is demonstrated by the flowchart in Fig. 1, where both rotation and translation offsets could be obtained by the following steps:

Figure 1.Workflow of the dual-cross-correlation-based translation and rotation registration (DCCTRR) algorithm.

Download full size

View all figures

First, perform Fourier transform on both images, which transforms the images from the spatial domain to the frequency domain and allows image processing operations (such as filtering, noise reduction, and contrast enhancement) to be performed prior to image registration, so as to improve the accuracy of registration.

Then, transform the images to log-polar coordinates. Taking the center of the frequency domain image as the origin ( $x_{0}$ , $y_{0}$ ), we can convert each point $(x, y)$ in the frequency domain into the polar coordinate system [ $In (r), θ$ ] with Eq. (1) and obtain a log-polar image^[21], ${\begin{matrix} r = \sqrt{{(x - x_{0})}^{2} + {(y - y_{0})}^{2}} \\ θ = \tan^{- 1} (y - y_{0}, x - x_{0}) \end{matrix} .$ (1)

The next step is to align the two log-polar images based on the normalized cross-correlation coefficient map of the two images. The NCC between the two images is expressed by^[22,23] $NCC [In (r), θ] = \frac{\sum {f [In {(r)}^{'}, θ^{'}] \cdot g [In {(r)}^{'} - In (r), θ^{'} - θ]}}{\sqrt{\sum f {[In {(r)}^{'}, θ]}^{2}} \sqrt{\sum {g {[In {(r)}^{'} - In (r), θ^{'} - θ]}^{2}}}}, - 1 < NCC [In (r), θ] < 1,$ (2)where $NCC [In (r), θ]$ represents the normalized cross-correlation value at position $[In (r), θ]$ ; $f (r^{'}, θ^{'})$ denotes the pixel value in the first image, typically the template image; and $g [In {(r)}^{'} - In (r), θ^{'} - θ]$ represents the pixel value in the second image, often the larger image or target image, after a spatial displacement. The summation is performed over the template and the spatially displaced image. This formula quantifies the similarity between the template and the target image to determine the matching position of the template within the target image. NCC is used to normalize the results, mitigating the influence of the brightness and contrast and enhancing the robustness of the matching process. The coordinate [ $In (r_{\max}), θ_{\max}$ ] of the peak in the NCC image means the scale and rotation angle difference, respectively, between the two images. Taking the center of the log-polar image as the origin point [ $In (r_{0})$ , $θ_{0}$ ], the zoom ratio between the two images can also be obtained from the following formula: $S = \frac{In (r_{\max}) - In (r_{0})}{In (r_{0})},$ (3)where $S$ represents the image scaling ratio. However, for the OCT imaging application, lateral scanning is commonly achieved by a pair of Galvo mirrors, which provides a stable scanning pattern due to the high scanning repeatability and linearity. Therefore, the scaling factor is negligible and not being taken into account in this work.

The last step is to rotate the image at the calculated angle $θ_{rotate}$ about the origin point.

In this pross, one image is rotated, causing a difference in the size of the two images. For translation alignment, we first match the size of the two images by zero-padding the reference image. Then, we calculate the NCC coefficient map and find out the coordinate of the maximum value ( $x_{\max}, y_{\max}$ ). The translation offset can be obtained and expressed by ${\begin{matrix} x_{shift} = x_{\max} - \frac{1}{2} P_{x} \\ y_{shift} = y_{\max} - \frac{1}{2} P_{y} \end{matrix},$ (4)where $x_{shift}$ and $y_{shift}$ , respectively, represent the amount of translation on the $x$ -axis and $y$ -axis. $P_{x}$ is the pixel number of the $x$ -axis, and $P_{y}$ is the pixel number of the $y$ -axis.

The last step of the DCCTRR is to apply the obtained translation compensation $x_{shift}$ and $y_{shift}$ to the image and merge them into one picture.

The basic principle of the FMT method can be found in Guo’s work^[20]. In brief, the workflow of the FMT method is similar to Fig. 1, and the difference is that the maximum value of the phase-correlation map is used for rotation and translation alignment. The phase-correlation parameter is obtained by $PCC (x, y) = F^{- 1} [\frac{\hat{f} (x, y) {\hat{g}}^{*} (x, y)}{| \hat{f} (x, y) \hat{g} (x, y) |}],$ (5)where $PCC (x, y)$ represents the phase-correlation value at position $(x, y)$ ; $\hat{f} (x, y)$ and $\hat{g} (x, y)$ , respectively, denote the pixel value in the first image and second image after Fourier transformation; ${\hat{g}}^{*} (x, y)$ represents the complex conjugate of $\hat{g} (x, y)$ ; and $F^{- 1}$ denotes the inverse Fourier transform. Just like the normalized correlation coefficient, the coordinate of the maximum number in the phase-correlation coefficient map indicates the offsets of the mismatched two images.

3. Results

In this work, the signal-to-noise ratio (SNR) and registration accuracy are used to evaluate the performance of the DCCTRR and FMT methods, and the SNR is calculated by dividing the maximum correlation coefficient value by the standard deviation of the whole correlation map, $SNR = \frac{z_{\max}}{σ},$ (6)where $z_{\max}$ represents the maximum correlation coefficient value, and $σ$ represents the standard deviation (STD) of the whole correlation map.

The registration accuracy of the overlapped region after alignment is defined as the proportion of the pixel number of the matched region (with signals on both images) over the average of the pixel numbers of the signaled regions in both images, which can be expressed by $Accuracy = \frac{N_{overlap}}{\frac{1}{2} (N 1_{signal} + N 2_{signal})},$ (7)where $N_{overlap}$ represents the pixel number of the matched signaled region, and $N 2_{signal}$ and $N 1_{signal}$ , respectively, represents the number of signaled regions of the two binarized images.

Image stich plays a key role in medical imaging for disease diagnosis, because not well-stitched large-region images may lead to an incorrect diagnosis and cause harmful consequence to patients. With an improved correlation map, SNR, and registration accuracy, the image registration process is more reliable, and the output is more accurate, which could greatly secure a doctor’s judgments for proper treatment for the patient.

3.1. System set-up of our home-built SDOCT system

All OCT/OCTA images shown in this work were obtained by our home-built SDOCT system, as illustrated in Fig. 2. The light source for this system is an SLD (cBLMD-T-850-HP-I, SUPERLUM, Ireland) with a central wavelength of 850 nm and a 3 dB bandwidth of 165 nm. The system comprises a Michelson interferometer structure, in which the optical beam is divided into a reference beam and a sample beam by a 50:50 fiber coupler. The incident beam in the reference arm is reflected by a mirror, while the backscattered beam in the sample arm interferes with it at the fiber coupler. The interference signal is then directed to the high-resolution spectrometer for detection. In the system, L1, L2, and L4 are collimating lenses with a focal length of 19 mm in the reference arm, sample arm, and spectrometer, respectively. L3 is an objective lens with a focal length of 30 mm, providing a lateral resolution of $\sim 6.2 μm$ . L5 is a custom-made Fourier lens with an effective focal length of 50 mm, formed by combining four achromatic lenses with focal lengths of 200 mm, as described in Chen’s work^[24]. In this system, all the parameters of the lenses are from Thorlab, US. The grating used in the system (FSTG-NIR1500-908, Ibsen, UK) has a resolution of 1500 lp/mm. The line-array sensor employed is a line-scan camera (2048 pixels, OCTOPLUS, E2V, UK) with an A-line rate of 130 kHz.

Figure 2.Schematic of our home-built SDOCT system. L1-L5, achromatic lenses; PC1-PC2, polarization controllers; DP, dispersion compensation pair.

Download full size

View all figures

3.2. Phantom experiments

For the phantom experiments, we first tested the performance of a single cross-correlation based method on images with only translation and both translation and rotation, and the results are shown in Figs. 3 and 4, where silk fabric with the fiber diameter of $\sim 32 μm$ was used to mimic microvasculature. Figure 3 shows the results. In the cross-correlation map between the image pair shown in Fig. 3(a), a cone shape is observed. The basic principle of image registration with the normalized cross-correlation method is to locate the position of the maximum value of the correlation coefficient map and then register the image with the coordinates of the peak. If we take the peak value as the signal, then the relationship between the signal-to-noise ratio (SNR) and overlap rates can be obtained and shown in Fig. 3(b). It can be seen that the SNR increases as the overlap rate increases. The relationship between the registration accuracy and the different overlap rate is shown in Fig. 3(c). The minimum overlap rate of successful registrations based on the cross-correlation method is 9.81% with a corresponding registration accuracy of 92.74%. From the overall trend of the curve, the registration accuracy gets higher as the overlap rate increases, but the curve is slightly oscillated, which is because the input images have a net-structure, which has periodically repeating intensity. Figures 3(d)–3(h) show registration images of five representative overlap rates. Figures 3(d)–3(h) correspond to Figs. 3(i)–3(m). Figures 3(i)–3(m) show the resulting RGB images obtained after aligning two binary images. In RGB images, the red and green channels refer to two original binary images. If there are signals in the same position of both channels, then they will present as yellow. In this way, the registration can be observed very clearly. This registration algorithm failed to work until the overlap rate was lower than 9.81%, while its accuracy was 92.74%, as shown in Fig. 3(h). Figure 4 depicts the results of the registration ability of the cross-correlation-based registration algorithm for images with relative rotation. Figure 4(a) is the same as Fig. 3(d), both of which are the registration results of the phantom experiment when the overlap rate is 96.6%. Figures 4(b)–4(e) show the registration results after rotating one of the original images in Fig. 4(a) by 1°–4°. The RGB images shown in Figs. 4(f)–4(j) correspond to the registration results of Figs. 4(a)–4(e). The red and green channels refer to the two original binary images. If there are signals in the same position of both channels, then they will present as yellow. The relationship between the different overlap rates and registration SNRs is shown in Fig. 4(j), and Fig. 4(l) shows the relationship between different overlap rates and registration accuracies. In the curve, points indicate actual sampling points. It is very clear that both the SNR and the registration accuracy are reduced by the rotation angle. With a rotation greater than or equal to 5°, the registration algorithm is invalid and cannot be registered correctly.

Figure 3.Results of the phantom experiment (silk fabric with a silk fiber diameter of ∼ 32 µm is used to mimic biological microvasculature) with the normalized cross-correlation method. (a) The normalized cross-correlation coefficient map between an image pair. The horizontal coordinate represents the position of the registered image relative to the reference image, and the vertical coordinate represents the cross-correlation coefficient value at that corresponding position. (b) The plot of the image overlap rates versus the normalized SNR of the correlation coefficient map. (c) The plot of different image overlap rates versus normalized registration accuracy. (d)–(h) Merged images of five pairs of representative images with different overlap rates. (i)–(m) Merged RGB images. The red and green channels are the two original binary images, and the matched signaled regions are yellow.

Download full size

View all figures

Figure 4.Results of image registration of cross-correlation method for the images with rotations. (a)–(e) The images with different rotation angles of 0°–4° respectively relative to the same referent image, and the overlap rates are all 96.6%. (f)–(j) The merged RGB images of (a)–(e), respectively. The red and green channels refer to the two original binary images, and the yellow indicates the matched signaled regions. (k) The plot of different rotation angles versus the obtained normalized SNR of the correlation coefficient. (l) The plot of different overlap rates versus the normalized registration accuracy.

Download full size

View all figures

Image registration based on normalized cross-correlation cannot cope with rotation, while registration based on the DCCTRR and FMT can. Therefore, we use the registration algorithm based on the FMT and dual cross-correlation to carry out phantom experiments. These two methods register the same pair of images at the same rotation angle (9.3769°), and we put the registration results of these two methods together to make a comparison, as shown in Fig. 5. The reason why 9.3769° is chosen as the rotation angle to explore the limit of the overlap rate is that in the practical OCT application in ophthalmology, patients remain stationary while being scanned, so the rotation mismatch angle can be assumed to be smaller than 10°.

Figure 5.Results of the comparison between the DCCTRR and FMT methods. In the coefficient map, the horizontal coordinate represents the position of the registered image relative to the reference image, and the vertical coordinate represents the correlation coefficient value at that corresponding position. (a) The normalized cross-correlation coefficient map. (b) The phase-correlation coefficient map. (c) The plot of the normalized SNRs of the correlation coefficient map versus different overlap rates. (d) The plot of the normalized registration accuracies versus different overlap rates. (e)–(i) Merged images of five representative overlap rates with the DCCTRR method (the first and the last are the images with the minimum and the maximum overlap rates, respectively). (j)–(n) The merged RGB images of (e)–(i), respectively. (o)–(s) Merged images of the same five pairs of images as (e)–(i) with the FMT method. (t)–(x) The merged RGB images of (o)–(s), respectively.

Download full size

View all figures

The correlation coefficient graphs of the registration algorithm based on the DCCTRR and FMT are cross-correlation and phase-correlation, respectively, as shown in Figs. 5(a) and 5(b). The difference between the two methods of cross-correlation and phase-correlation can be seen clearly from Figs. 5(a) and 5(b). The cross-correlation map is a whole continuous surface, and the phase-correlation map is discontinuous and composed of countless vertical lines. Therefore, the FMT method has high accuracy and high SNR. However, for the FMT method, having a higher SNR does not mean that it can have better registration performance. In Fig. 5(c), it can be seen that the FMT method failed early when the overlap rate was less than 61.86% while the DCCTRR could successfully register images until the overlap rate was lower than 12.58%, and the normalized SNR of these two methods changed quite similarly with the overlap rate. Then, Figs. 5(e)–5(i) show registration result images of five representative overlap rates with the image registration algorithm based on dual cross-correlation. The first image represents the registration result with the maximum overlap rate, the fifth image represents the result with the minimum overlap rate, and the intermediate images have equal intervals. The RGB images shown in Figs. 5(j)–5(n) corresponding to the registration result of Figs. 5(e)–5(i). The red and green channels refer to two original binary images. The yellow region indicates the matched signaled regions between the two images. Figures 5(o)–5(s) show the registration result images of five representative overlap rates with the FMT method. The RGB images shown in Figs. 5(t)–5(x) correspond to the registration result of Figs. 5(o)–5(s). The relationship between different overlap rates and registration SNR of these two methods is shown in Fig. 5(c). The relationship between different overlap rates and registration accuracies of these two methods is shown in Fig. 5(d). Not only just the SNR but also the registration accuracy increases along with the overlap rate. In Figs. 5(c) and 5(d), the blue lines represent the results of the DCCTRR, while the red lines represent the results of the FMT method, and points on the curves indicate actual sampling points. As shown in Fig. 5(d), the DCCTRR can still successfully register images when the overlap area ratio is 12.58% with a registration accuracy of 88.51%, while the FMT method will fail when the overlap ratio is less than or equal to 61.86%. Compared to the FMT method, the DCCTRR exhibits a slower decrease in normalized accuracy with a decreasing overlap ratio, indicating superior performance.

To assess the DCCTRR’s ability of robustness to background noise, Gaussian noise was added to two en face OCT images for registration, and the noise floor was gradually increased until the registration failed. The used OCTA images are shown in Fig. 6(a) with an overlap rate of 90.708% and a relative rotation angle of 9.3769°. The maximum STDs of the added Gaussian noise for the DCCTRR and FMT methods are 2.52 and 0.43, respectively. The aligned results are shown in Figs. 6(b) and 6(c). The merged RGB images of (a), (b), and (c) are shown in (d), (e), and (f), respectively. In Figs. 6(g) and 6(h), it can be observed that the normalized SNRs and the normalized accuracies of both the DCCTRR and FMT methods decreased as the STD of the Gaussian noise increases, which matches with the theory. However, the DCCTRR exhibits a smaller slope, meaning higher robustness to background noise.

Figure 6.DCCTRR and FMT methods for registering images with added Gaussian noise. (a) Registered images without added Gaussian noise. Merged images using the (b) DCCTRR and (c) FMT methods with the maximum STD of the added Gaussian noise. The merged RGB images of (a), (b), and (c) are shown in (d), (e), and (f), respectively. (g) The plot of the normalized SNRs of the coefficient map versus the STDs of the added Gaussian noise. (h) The plot of the normalized registration accuracies versus the STDs of the added Gaussian noise.

Download full size

View all figures

3.3. In vivo experiments

In vivo experiments were also conducted to investigate the performance of the two methods, and the results are shown in Fig. 7, where a human finger was scanned, and the obtained en face images were used for testing the performance of the two methods. Figures 7(a)–7(f) show images of six representative overlap rates. The first image represents the registration result with the maximum overlap rate, the sixth image represents the result with the minimum overlap rate, and the intermediate images have equal intervals. The RGB images shown in Figs. 7(g)–7(l) correspond to the registration results of Figs. 7(a)–7(f). The red and green channels refer to two original binarized images. If there are signals in the same position of both channels, then they will present as yellow. Figures 7(m) and 7(n), respectively, illustrate the relationship between the overlap rate and the registration SNR/accuracy. On the curves, the points indicate actual sampling points. Both increase as the overlap rate increases. The DCCTRR can successfully register two OCTA en face images of human finger capillaries with an overlap rate of 12.09%.

Figure 7.Registration results of human finger OCTA images with the DCCTRR method. (a)–(f) Merged images of six representative overlap rates (The first one and the last one, respectively, have the maximum and the minimum overlap rates). (g)–(l) The corresponding RGB images of (a)–(f), respectively. (m) The plot of the normalized SNRs of the correlation coefficient map versus the different overlap rates. (n) The plot of the normalized registration accuracies versus the different overlap rates.

Download full size

View all figures

In the aforementioned in vivo experiment, the rotational deviation between the two images was measured to be 0.1839°. It is important to note that practical applications may involve larger rotational deviations, which can result in a reduced overlap rate compared to the parameters here. Hence, it is recommended to employ an overlap rate of 20% for practical applications, which is slightly higher than the threshold value.

It deserves to be mentioned that the FMT method fails to register the images of Fig. 7(a) with an overlap rate of 30.238%, which matches with the results of the phantom experiments, where the overlap rate for the FMT method needs to be $> 61.86 %$ .

Furthermore, we finally performed multiple scans on the volunteer’s finger with an overlap rate of 20% ( $3 \times 3$ regions and each cover $\sim 2 mm \times 2 mm$ ) and performed DCCTRR to stitch all local images, and the results are shown in Fig. 8. The scanned area corresponds to the marked position on the ring finger in Fig. 8(a), and the obtained mosaic images are shown in Fig. 8(b).

Figure 8.Merged large-scale OCTA images with 3 × 3 OCTA en face images. (a) The left hand of a healthy volunteer (the marked region by a solid line was scanned). (b) The merged large-scale image obtained by the DCCTRR method.

Download full size

View all figures

4. Discussion

The phase-correlation-based registration method is widely used in satellite-map stitching and geographic-map stitching and performs better over the cross-correlation-based method because it is robust to image intensity changes, illumination, and noise. In comparison to the cross-correlation map, which is smooth and continuous, and the peek indicates the matched location, the phase-correlation map is rough, and sharp spikes indicate the possible matching locations. This feature provides the phase-correlation with good registration accuracy and SNR. However, the phase noise in the phase-correlation method may also be spiked and may degrade the accuracy of the registration. For example, in Eq. (6), the SNR is calculated by dividing the maximum peak by the STD of the entire image. In the phase-correlation map, what affects the registration accuracy is random phase noise that appears also as spikes. The image alignment fails if any phase noise is bigger than the corresponding phase correlation signal.

The DCCTRR and the FMT methods use cross-correlation and phase-correlation, respectively, for image registration, and the computing complexity and processing time are also different. For cross-correlation operation, shown by Eq. (1), the method entails more multiplication and addition operations, along with convolving the entire image, leading to higher computational complexity. On the other hand, phase-correlation is typically computed in the frequency domain, leveraging the properties of the Fourier transform for efficient processing. It primarily involves Fourier transform, complex multiplication, inverse Fourier transform, and other operations, as depicted in Eq. (5), resulting in a lower computational complexity. We performed both methods on the same laboratory computer (Windows 10, 3.70 GHz CPU, 32 GB memory), and the time required to align a pair of OCTA images (1000 pixel × 1000 pixel) for the FMT method was 0.8664 s, in which two rounds of phase-correlation operations cost 0.2642 s (about 30.49% of the total time). In contrast, the DCCTRR method took 1.2522 s, in which the two rounds of cross-correlation operations cost 0.9145 s (about 73.03% of the total time). The results indicate that cross-correlation calculation mainly contributes to the total computing time of the DCCTRR method. Therefore, for the applications where (near) real-time processing is needed, a parallel computing structure (such as GPU and FPGA) may be able to be applied to improve cross-correlation calculation, which may also be under our future research scope.

Image stich plays a key role in medical imaging for disease diagnosis because not well-stitched large-region images may lead to an incorrect diagnosis and cause harmful consequences to patients. With improved correlation map SNR and registration accuracy, the image registration process is more reliable, and the output is more accurate, which could greatly secure doctor’s judgments for a proper treatment for the patient.

5. Conclusion

In summary, this work investigated the performance cross-correlation (instead of phase correlation) in the workflow of the FMT method for calculating translation and orientation offsets (called cross-correlation-based method, DCCTRR) and compared its performance to the FMT method on OCTA images alignment. Both phantom and in vivo experiments were implemented to demonstrate the comparisons, and the results show that the DCCTRR method requires a smaller overlap rate between the images to achieve a comparable registration accuracy to the FMT method. With a reduced overlap rate for image stitching, patient scanning efficiency for large-scale imaging in clinical applications could be improved, and furthermore, the DCCTRR method may also be helpful with the application of combining OCT with a robotic arm for automatic patient scanning.

Category: Imaging Systems and Image Processing

Received: Jan. 15, 2024

Accepted: Mar. 12, 2024

Published Online: Jul. 17, 2024

The Author Email: Chaoliang Chen (chaoliangchen@seu.edu.cn)

DOI:10.3788/COL202422.071101

CSTR:32184.14.COL202422.071101

微信扫一扫：分享