1 Introduction
In radar engineering, the scene of a radar illuminating a sea-ship target attracts lots of attention in microwave remote sensing. The electromagnetic scattering simulation of such large scale target has always been a challenge problem in the field of computational electromagnetics (CEM). The high-frequency method based on ray tracing has become one of the most practical solutions for these problems [1,2]. However, as the demands of engineering projects increase and the targets of interesting become more complex, the computational efficiency of traditional high-frequency methods has become inadequate.
Aiming to the mentioned problems, researchers have carried out relevant works on how to improve the scattering simulation of the acceleration technology by using high-frequency methods [3]. Brem et al. developed the acceleration techniques of the shooting and bouncing rays (SBR) method by using graphics processing unit (GPU) [4, 5], and Guo et al. explored some near-field scattering problems by using SBR [6–8]. Kee et al. adopted GPUs to accelerate the tracing process and introduced the GPU parallel implementation of the SBR-PO (physical optics) method in details [9,10]. Zheng et al. carried out some researches by using GPU to achieve efficient electromagnetic simulation technology for the full-scale sea-ship target scattering problem [11,12].
Furthermore, radar imaging simulations of sea-ship targets require massive simulated radar echo database. Therefore, in order to solve the computational difficulty caused by the large number of ray tubes, a parallel computing scheme based on the CPU-GPU parallel computing scheme and a multi-GPU framework is proposed in this paper. Finally, the radar images of ships at different seas are simulated and analyzed.
2 High-frequency method
The high-frequency methods include PO, SBR, physical theory of diffraction (PTD), etc. The PO method is derived from Kirchhoff ’s approximation to solve surface element scattering. The SBR method is based on the geometric optics (GO) method to solve electromagnetic scattering effects between surface elements. The PTD method expresses the edge scattering contribution for wedge problem [2].
The SBR method approximates the scattering of electromagnetic waves by using ray tubes. Since the composite sea-ship targets require the massive number of ray tubes to represent the effects of far-field radar waves. In the SBR process, an incident plane wave is modeled by a set of parallel ray tubes. These rays are launched from a plane orthogonal to incident wave direction. In the implementation, the required aperture of the plane wave can be bounded and is referred to as ray aperture. And the ray aperture can be estimated using bounding box containing the object [13], which should be meshed into ray tubes according to the ray density which is usually smaller than $ \lambda /10 $. Therefore, the number of ray tubes is in general up to billion level, giving rise to very huge time-consumption in SBR calculation. Furtherly, the ray tracing exploits the bounding volume hierarchy (BVH) technique [14], inherited from computer graphics, which is well suited for parallel computing on GPUs.
3 Multi-GPU parallel implementation
To improve the efficiency of EM scattering simulation of such scale target, the compute unified device architecture (CUDA) platform developed by NVIDIA is used to implement ray tracing solution. The simulations in this paper are all conducted on the computer server. The basic hardware parameters of the running platform are: Intel Xeon E5-2620 V4 CPU, dual-channel NVIDIA Titan XP GPU, and dual cards.
3.1 CPU-GPU parallelism
According to different computational difficulties of the three algorithms (i.e., PO, SBR, and PTD) in a high-frequency method, a CPU-GPU heterogeneous parallel computing scheme is proposed. Fig. 1 shows a schematic diagram of the parallel computing of the CPU and GPU heterogeneous scheme. CPU and GPU are interconnected through an external bus with its own off-chip memory. According to computing capabilities and characteristics of CPU and GPU, the computing strategy is designed as follows. Firstly, the PTD method is allocated to CPU for execution, and the OpenMP multi-thread acceleration technology is used for processing. Then, the scattering contribution $ {E}_{\mathrm{P}\mathrm{T}\mathrm{D}}^{s} $ of target edges for wedge problem can be obtained through CPU calculation. Secondly, the PO-SBR methods are allocated and executed on GPU, and are mapped to GPU for calculation through independent parallelizable ray tubes. Then, the scattering contribution $ {E}_{\mathrm{P}\mathrm{O}{\text{-}}\mathrm{S}\mathrm{B}\mathrm{R}}^{s} $ of the target surface can be obtained through GPU calculation. Generally, the PO-SBR process is time-consuming for the large-scale sea-ship scattering problem. The main thread of the entire algorithm is controlled by CPU scheduling. The total scattering fields are finally summed up, i.e., $ {E}^{s}={E}_{\mathrm{P}\mathrm{T}\mathrm{D}}^{s}+{E}_{\mathrm{P}\mathrm{O}{\text{-}}\mathrm{S}\mathrm{B}\mathrm{R}}^{s} $. The main thread of the entire algorithm is controlled by CPU scheduling.

Figure 1.CPU-GPU parallelism.
The GPU acceleration process in this application program mainly includes three steps: To transfer the input data from the CPU memory to the GPU device memory, to call GPU for execution, and to copy the output data from GPU memory back to CPU memory. During the programming implementation process, the above processes are implemented through a series of CUDA encapsulated storage management functions (such as, cudaMalloc( ), cudalMemcpx( ), and cudaFree( )) by the application program interface (API) [12].
3.2 Multi-GPU computing framework
In term of the parallel computing capability of GPU, a multi-GPU computing framework is further proposed to accelerate the tracing process. In order to deal with the multi-tasks in real time, a multi-process service (MPS) concurrency is used to control GPU calculations.
The MPS-based multi-GPU framework consists of task scheduling threads and GPU control threads. Multi-GPU control is implemented by reading GPU device information mounted on the host and then creating a thread object. Each thread uses cudaSetDevice() to specify the corresponding device for control. MPS allocates a unique context for the thread, which contains memory, modules, data streams, local kernel cache, and other information required for the operation of the device. And the CUDA kernel traces the ray according to the context information. After completing thread allocation by the main program, system scheduling is executed and tracing tasks are allocated in divided task segmentation by dispatching each task to the thread. The multi-GPU computing framework is shown in Fig. 2.

Figure 2.Multi-GPU parallel framework.
The composite model of sea-ship is used to verify the accuracy of the simulation and the rigorous numerical method. The simulation results of the multiple fast multipole method (MLFMA) provided by the commercial software are used as references. Limited by the computing capability of MLFMA, the size of the scaled ship model is set to $ 27\; \mathrm{m} \times 3.2\; \mathrm{m} \times 5 \; \mathrm{m} \; ( { {L}}\times { {W}}\times { { { {H}}}}) $, and the size of sea is $ 48\; \mathrm{m} \times 16 \; \mathrm{m}\; ( { {L}}\times { {W}}) $, as shown in Fig. 3. The wind speed is 5 m/s, and the seawater dielectric constant is assumed to be εr = 4−j8. The radar frequency is 400 MHz. The comparison results of radar mono-station scattering in the xoz plane are provided, including three simulations from MLFMA, high-frequency methods by single GPU and dual GPUs, respectively. The comparison results show that the simulation accuracy is in good agreement. Dual GPUs are the same as single GPU. Moreover, the averaged computation time per incident angle and frequency is about 395.2 s for the MLMFA simulation, 0.041 s for the single GPU simulation, and 0.028 s for the dual GPUs simulation, respectively. Therefore, the high-frequency method with GPU acceleration has remarkable computational efficiency. Finally, the radar echoes are simulated by using the proposed method.

Figure 3.Scaled ship on sea and its radar cross section comparison results: (a) model of scaled ship on sea, (b) vertical-vertical, and (c) horizontal-horizontal.
4 Range-Doppler imaging of ship
The range-Doppler (RD) algorithm is used to generate the radar image through the simulated radar echoes, the radar echoes of the sea-ship scene are from different azimuthal angles and different frequencies. When the frequency bandwidth $ B $ is small relative to the center frequency $ {f}_{c} $ and the rotation angle $ \beta $ in azimuth is small, an approximation can be carried out: $ B\ll {f}_{c} $, $ \mathrm{s}\mathrm{i}\mathrm{n}\beta \approx \beta $, and $ \mathrm{c}\mathrm{o}\mathrm{s}\beta \approx 1 $. It can be approximately considered that each scattering center at different slow time is still located within the same distance unit, and the phase influence of the echo only comes from the exponential part. By performing inverse fast Fourier transform (IFFT) on the slow-time dimension of the equation, the inverse synthetic aperture radar imaging (ISAR) is generated as
$ \rm{ISAR}=\sum_i^N \frac{2 \sigma_i T_p f_c \beta}{c} \exp \left[\frac{-\mathrm{j} 4 \pi f_c\left(x_i+d_r\right)}{c}\right] \rm{sinc}\left[\frac{2 B}{c}\left(x-x_i\right)\right] \rm{sinc}\left[\frac{2 f_c \beta}{c}\left(y-y_i\right)\right] $ (1)
where, $ {T}_{p} $ is the pulse width of the radar transmitting signal, $ c $ is the light speed, and $ {d}_{r} $ is the distance between the radar transmitting/receiving position and the origin of the target coordinate system. The range resolution and azimuth resolution in the formula are $ \Delta x=\dfrac{c}{2B} $ and $ \Delta y=\dfrac{c}{2{f}_{c}\beta } $ , respectively. Therefore, the basic process of ISAR imaging mainly includes:
1) Set the imaging parameters the bandwidth $ B $ and frequency interval $ \Delta f $ according to the range resolution $ \Delta x $, and rotation angle $ \beta $ in azimuth according to the azimuth resolution $ \Delta y $.
2) Transmit linear frequency modulation (LFM) pulses at each slow-time azimuth angle, and use the high-frequency method and GPU parallel acceleration technology proposed above to calculate the pulse-compressed echo at each azimuth angle and frequency.
3) Use the RD algorithm to perform pulse compression on the azimuth direction, and obtain the ISAR image through (1).
5 Simulation analysis of composite sea-ship scene
5.1 Radar cross section (RCS) simulation
Based on the proposed algorithm and parallel techniques, a fleet of cargo ships is used to verify the feasibility of the parallel implementation.
The fleet of ship scene includes 6 cargo ships. The ships are $ 200\;\mathrm{m} $ apart from the front and rear and $ 50\;\mathrm{m} $ from the left and right. The ship size is $ 140\;\mathrm{m}\times 18\;\mathrm{m}\times 17\;\mathrm{m} $$ (L\times W\times H) $. The sizes of the sea surface are set to $ 400\;\mathrm{m}\times 400\;\mathrm{m} \; (L\times W) $ and $ 800\;\mathrm{m}\times 800\;\mathrm{m}\; (L\times W) $, respectively. And the wind speed on sea is $ 5\;\mathrm{m}/\mathrm{s} $, as shown in Fig. 4. A mono-station scattering simulation is performed for the cargo ships, in which the radar frequency is $ 3\;\mathrm{G}\mathrm{H}\mathrm{z} $, the incident azimuth angle is 90°, and the pitch angle is 0°−89°. Fig. 5 shows the RCS results of the vertical-vertical (VV) and horizontal-horizontal (HH) polarization. Table 1 shows the efficiency of dual-GPU parallel computing acceleration ratio under different sea surface sizes. The results show that the average acceleration of the dual-GPU algorithm is about 200% compared with the single-GPU algorithm, demonstrating the effectiveness of electromagnetic scattering simulation in extremely large scenes.

Figure 4.A fleet of cargo ships.

Figure 5.RCS results at pitch angles and sea scenes: (a) 400 m×400 m sea and (b) 800 m×800 m sea.

Table 1. Time consumptions at different sea scenes.
Table 1. Time consumptions at different sea scenes.
Sea surface size (m2) | Single GPU (s) | Dual GPUs (s) | Speedup ratio (%) | 400×400 | 264.3 | 133.5 | 198 | 800×800 | 1179.6 | 575.6 | 205 |
|
5.2 Radar imaging of sea-ship scene
Furthermore, after obtaining the radar echo signal of the composite sea-ship targets, the ISAR imaging algorithm is used to obtain the radar images. A frigate ship target is used to analyze the impact of sea states on images, with a size of about $ 118\;\mathrm{m}\times 18\;\mathrm{m}\times 14\;\mathrm{m} \; (L\times W\times H) $. The sea surface size is $ 300\;\mathrm{m}\times 300\;\mathrm{m} \; (L\times W) $, and the wind speeds on sea are 0, 2 m/s, and 5 m/s, relative to the sea states calm, level 1, and level 2, respectively. The grazing angles are 40° and 20°. The radar scene parameters are shown in Table 2.

Table 2. Basic parameters of radar imaging.
Table 2. Basic parameters of radar imaging.
Parameters | Values | Target size (m) | 118×18×14 (L×W×H) | Sea size (m2) | 300×300 | Resolution ($ \Delta x{\mathrm{,}}\;\Delta y $) (m) | (0.75, 0.75) | Polarization | HH | Grazing angles (°) | 40, 20 | Azimuth angle (°) | 0, 90 | Bandwidth $ \left(B\right) $ (GHz) | 9.9−10.1 | Number of frequency points | 400 | Rotation angle $ \left(\beta \right) $(°) | 5 | Number of azimuth angle points | 400 |
|
Fig. 6 shows the radar imaging results. Furthermore, the averaged peak signal-to-noise ratios (PSNRs) under different sea states and different pitch angles are analyzed. The PSNR is the ratio of the strongest scattering point of target versus the averaged noise of the sea surface background. For each radar image, the value of the peak signal is from the strongest scattering point of target, and the averaged noise value is from the scattering background of the sea surface excluding the target of interest. The comparison results are shown in Table 3. Obviously, the more complicated the sea states are, the more it deteriorates the target imaging quality.

Figure 6.Comparison of ship imaging at different sea states and angles: (a) calm sea with a grazing angle of 40°, (b) calm sea with a grazing angle of 20°, (c) level 1 sea with a grazing angle of 40°, (d) level 1 sea with a grazing angle of 20°, (e) level 2 sea with a grazing angle of 40°, and (f) level 2 sea with a grazing angle of 20°.

Table 3. Averaged PSNRs at different sea states.
Table 3. Averaged PSNRs at different sea states.
Sea states (dB) | Grazing angle of 40° | Grazing angle of 20° | Calm | 35.84223 | 29.19082 | Level 1 | 26.81253 | 25.28394 | Level 2 | 25.08222 | 24.32826 |
|
The above simulations are all performed based on the GPU computer server. The number of radar imaging points in each scenario is 400 m × 400 m. The average simulation time of radar imaging data required for each image is about 1.8 h.
6 Conclusion
Aiming to accelerate the simulation of electromagnetic scatting of sea-ship scenes, this paper proposed a parallel computing technique based on the CPU-GPU parallel computing scheme and a multi-GPU framework for the high-frequency method. Simulation and imaging of composite super electromagnetic large sea-ship scene have been conducted to validate the feasibility of the proposed scheme. Simulation results demonstrate that the developed scheme can be used as a radar echo simulation tool to generate radar images and realize the analysis of sea-ship radar imaging at different sea states. Of course, scattering from sea-ship scenes is a very challenge problem in the field of electromagnetic scattering, especially for high level sea states. This paper only simulated the scattering at sea states level 1 and level 2. How to efficiently simulate at the circumstance of higher sea state is the work we are undergoing.
Disclosures
The authors declare no conflicts of interest.