Block-modulating video compression: an ultralow complexity image compression encoder for resource-limited platforms

Siming Zheng; Yujia Xue; Waleed Tahir; Zhengjue Wang; Hao Zhang; Ziyi Meng; Gang Qu; Siwei Ma; Xin Yuan

doi:10.3788/AI.2024.10006

Advanced Imaging, Volume. 1, Issue 2, 021002(2024)

Block-modulating video compression: an ultralow complexity image compression encoder for resource-limited platforms

Siming Zheng^1、†, Yujia Xue², Waleed Tahir², Zhengjue Wang³, Hao Zhang⁴, Ziyi Meng⁵, Gang Qu¹, Siwei Ma⁶, and Xin Yuan^1、*

Author Affiliations

¹Research Center for Industries of the Future (RCIF) and School of Engineering, Westlake University, Hangzhou, China

²Department of Electrical and Computer Engineering, Boston University, Boston, USA

³National Key Laboratory of Radar Signal Processing, Xidian University, Xi’an, China

⁴State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, China

⁵Westlake Intelligent Vision Technology, Hangzhou, China

⁶School of Computer Science, Peking University, Beijing, China

show less

Figures & Tables(12)

Fig. 1. Pipeline of the proposed BMVC encoder. (a) For each input image with a size of $N_{h} \times N_{w}$ , the input frame is first divided into nonoverlapping blocks of size $B_{h} \times B_{w}$ . Then the image blocks are modulated (element-wise multiplication) by binary masks of the same size. Next, these modulated image blocks are summed together to yield a single block of size $B_{h} \times B_{w}$ . At last, the summed modulated image block is quantized to a user-defined bit depth (8–16 bit) and then transmitted to the receiver. An equivalent encoding pipeline is shown in (b), where the modulation happens before dividing into blocks.

Download full size

View in Article

Fig. 2. PnP optimization-based decoding algorithm for BMVC-PnP. The encoded image block along with the modulation binary masks are fed into the BMVC-PnP decoder as inputs. The BMVC-PnP iteratively performs a linear projection step to account for the BMVC encoding process and a DL-based denoising step as an implicit prior. We use a pretrained FFDNet43 as the denoising CNN for its flexibility and robustness against various noise levels.

Download full size

View in Article

Fig. 3. E2E neural-network-based decoding algorithm for BMVC-E2E. The encoded image block along with the modulation binary masks are fed into the BMVC-E2E decoder as inputs. The feed-forward BMVC-E2E decoder consists of several stages, where each stage contains a linear projection step and a convolutional neural network. All BMVC-E2E decoders are trained in an E2E fashion. We use 2D-U-Net and 3D-CNN with reversible blocks (RevSCI) to facilitate memory-efficient training.

Download full size

View in Article

Fig. 4. Test data set (set 13) we used to evaluate the BMVC pipeline and other compression methods.

Download full size

View in Article

Fig. 5. Decoded image results at various Crs with the proposed BMVC-PnP and BMVC-E2E approaches. The BMVC-E2E results consistently have good decoding quality at both low and high Crs. The BMVC-PnP decoder provides higher image quality for low Crs while producing some denoising artifacts at high Crs.

Download full size

View in Article

Fig. 6. Comparison of the BMVC pipeline with other image compression methods: random DS, block CS, and JPEG2000 compression. For the random DS and block CS experiments, we implemented their decoders based on the PnP algorithm with FFDNet as the flexible denoiser. Results are shown with a low $Cr = 32$ and a high $Cr = 80$ . The two BMVC decoders provide consistently high-quality images. The random DS method fails because, in principle, the random DS decoder is solving an image inpainting task with only $< 3 %$ pixels available. Block CS also shows equally good decoding results, but we will show later that block CS will deteriorate after aggressive data quantization. As expected, JPEG2000 compression gives the best image quality at $Cr = 32$ and 72 with minor blurriness, since JPEG2000 utilizes an optimal set of wavelet basis but with the cost of increased encoding computation cost.

Download full size

View in Article

Fig. 7. PSNR performance of different compression methods at a wide range of Crs. PSNR value is computed for Y channels only. The BMVC-E2E has a PSNR increase at $Cr = 60$ . This is because the BMVC-E2E decoder has an additional stage of 3D-CNN for all $Cr \geq 60$ .

Download full size

View in Article

Fig. 8. Evaluation of robustness to quantization bits. BMVC and block CS both show high PSNR performance when the dynamic range of the data is intact. In practice, quantization will affect the codec performance in real-world video signal transmission. The bar plots indicate how the three decoders (BMVC-PnP, BMVC-E2E, and block CS) perform under different quantization bits. BMVC decoders have consistent performance regardless of data quantization. However, block CS has noticeable decreases in PSNR at 10-bit and 8-bit quantization.

Download full size

View in Article

Table 1. PSNR (top line in each cell in dB) and SSIM (bottom line in each cell) Performance for Different Compression Methods at a Wide Range of Crs on the HD Image of Size 1080×1920. The highest performance of CS-based methods is bold-faced.

View table

View in Article

Table 1. PSNR (top line in each cell in dB) and SSIM (bottom line in each cell) Performance for Different Compression Methods at a Wide Range of Crs on the HD Image of Size 1080×1920. The highest performance of CS-based methods is bold-faced.


Cr ( $N_{b}$ ) BMVC: block size	150 $108 \times 128$	120 $108 \times 160$	100 $108 \times 192$	80 $108 \times 240$	72 $120 \times 240$	60 $216 \times 160$	50 $216 \times 192$	40 $216 \times 240$	32 $270 \times 240$	24 $270 \times 320$
BMVC-PnP	22.89, 0.682	24.32, 0.707	25.97, 0.745	27.30, 0.781	27.96, 0.797	28.83, 0.822	29.87, 0.848	31.10, 0.877	32.23, 0.895	33.58, 0.913
BMVC-E2E	26.59, 0.802	27.38, 0.815	27.80, 0.821	28.81, 0.837	29.31, 0.849	30.74, 0.871	29.08, 0.839	29.23, 0.843	29.98, 0.854	31.05, 0.871
Random DS	8.95, 0.354	9.56, 0.401	10.10, 0.431	10.88, 0.460	11.33, 0.472	12.17, 0.491	13.30, 0.516	14.87, 0.555	16.61, 0.609	18.58, 0.690
Block CS	26.59, 0.819	26.93, 0.826	27.65, 0.837	27.88, 0.843	28.20, 0.849	28.70, 0.857	29.38, 0.865	29.83, 0.870	30.70, 0.876	31.79, 0.884
JPEG2000	30.30, 0.852	30.94, 0.862	31.42, 0.869	31.99, 0.877	32.80, 0.888	33.41, 0.897	33.84, 0.902	34.60, 0.911	36.45, 0.931	37.51, 0.941

Table 2. Computation Cost of Different Compression Methods.

View table

View in Article

Table 2. Computation Cost of Different Compression Methods.


Codec	Encoder Cost	Dynamic Range of Measured Data	Mask/Basis Size	Decoder	Comment
BMVC	# addition: $\frac{(N_{b} - 2) N}{2 N_{b}}$ # multiplication: 0	$\frac{N_{b}}{2}$	$N$	PnP: iterative networkE2E: feedforward network	$Cr = N_{b}$ , N_b: # blocks N: # pixels
Random DS	# addition: 0# multiplication: 0	1	$N$	PnP: iterative network	$Cr = \frac{N}{N_{s}}$ , N_s: # sampled pixels
Block CS	# addition: $\frac{N}{2} - N_{b}$ # multiplication: 0	$\frac{N}{2 N_{b}}$	$\frac{M N}{N_{b}}$	PnP: iterative network	Cr = $\frac{N}{M N_{b}}$ , M: # measurements per block
JPEG2000	# addition: $(N - N_{b}) \log_{2} \frac{N}{N_{b}}$ # multiplication: $N \log_{2} \frac{N}{N_{b}}$	1	flexible	Discrete wavelet transform (DWT)	$N_{b} = \frac{N}{B^{2}}$ , B: block size

Table 3. PSNRs of BMVC-PnP, BMVC-E2E, and Block CS under Different Quantization Bits.

View table

View in Article

Table 3. PSNRs of BMVC-PnP, BMVC-E2E, and Block CS under Different Quantization Bits.


	BMVC-PnP (dB)	BMVC-E2E (dB)	Block CS (dB)
Bit/Cr	100	80	50	24	100	80	50	24	100	80	50	24
8-bit	25.921	27.274	28.829	33.451	27.785	28.796	29.060	31.023	27.190	27.049	27.361	29.563
10-bit	25.981	27.305	29.882	33.592	27.805	28.813	29.082	31.056	27.634	27.957	29.361	31.931
12-bit	25.981	27.301	29.889	33.611	27.803	28.818	29.087	31.059	27.800	28.123	30.090	33.132
14-bit	25.982	27.303	29.892	33.613	27.802	28.819	29.088	31.060	27.813	28.132	30.099	33.149
16-bit	25.982	27.304	29.879	33.576	27.803	28.818	29.088	31.060	27.814	28.133	30.102	33.151
ΔPSNR↓	0.0606	0.0315	0.0629	0.1627	0.0202	0.0230	0.0282	0.0370	0.6238	1.0844	2.7415	3.5872

Table 4. Ablation Study of BMVC-E2E (Cr≤50).

View table

View in Article

Table 4. Ablation Study of BMVC-E2E (Cr≤50).


Decoder Structure	PSNR (dB)	Runtime (ms)
2 × U-Net + 2 × 3D-CNN (RevSCI) + 1 × deeper 3D CNN (RevSCI)	31.720	$\sim 255$
2 × U-Net + 2 × 3D-CNN (RevSCI)	31.449	$\sim 130$
2 × U-Net + 1 × 3D-CNN (RevSCI)	29.228	$\sim 70$
2 × U-Net	17.930	$\sim 10$
1 × U-Net	10.308	$\sim 5$

Tools

Get Citation

Copy Citation Text

Siming Zheng, Yujia Xue, Waleed Tahir, Zhengjue Wang, Hao Zhang, Ziyi Meng, Gang Qu, Siwei Ma, Xin Yuan, "Block-modulating video compression: an ultralow complexity image compression encoder for resource-limited platforms," Adv. Imaging 1, 021002 (2024)

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Research Article

Received: Jan. 22, 2024

Accepted: Jul. 9, 2024

Published Online: Aug. 8, 2024

The Author Email: Xin Yuan (xyuan@westlake.edu.cn)

DOI:10.3788/AI.2024.10006

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology