From Perception to Creation: Exploring Frontier of Image and Video Generation Methods

Fig. 18. PVDM framework that represents the video as three two-dimensional hidden variables, and thus uses the two-dimensional diffusion model for training^[112]

Download full size

Fig. 19. Text-to-video generation results of Make-A-Video^[113]

Download full size

Fig. 20. VideoFusion framework that uses pre-trained text-to-image diffusion model to generate base frame and uses video data to train a residual noise generator^[114]

Download full size

Fig. 21. Video editing based on input image or text prompt^[115]

Download full size

Fig. 22. One shot text-to-video generation^[116]

Download full size

Table 1. FID comparison of different text-to-image pre-trained models on MS-COCO dataset
View table
Table 1. FID comparison of different text-to-image pre-trained models on MS-COCO dataset
Method FID↓
LAFITE^［119］ 26.94
DALL·E^［120］ 17.89
LDM^［100］ 12.63
GLIDE^［96］ 12.24
DALL·E2^［97］ 10.39
Imagen^［98］ 7.27

Table 2. Performance comparison of different class-to-video generation methods on UCF-101 dataset and Sky Time-lapse dataset

View table

Table 2. Performance comparison of different class-to-video generation methods on UCF-101 dataset and Sky Time-lapse dataset

Method	FVD↓ （Sky Time-lapse $256 \times 256$ ）	FVD↓ （UCF-101 $256 \times 256$ ）	FID↓ （UCF-101 128 $\times 128$ ）	IS↑ （UCF-101 128 $\times 128$ ）
MoCoGAN^［125］	206.6	1821.4		12.42
VideoGPT^［126］	222.7	2880.6		24.69
MoCoGAN-HD^［127］	164.1	1729.6	838	32.36
DIGAN^［128］	83.1	471.9	655	29.71
VIDM^［111］	57.4	294.7	306	53.34
PVDM^［112］	55.4	343.6		74.40

Tools

Get Citation

Copy Citation Text

Liang Lin, Binbin Yang. From Perception to Creation: Exploring Frontier of Image and Video Generation Methods[J]. Acta Optica Sinica, 2023, 43(15): 1510002

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Image Processing

Received: Mar. 30, 2023

Accepted: Jul. 22, 2023

Published Online: Aug. 15, 2023

The Author Email: Yang Binbin (yangbb3@mail2.sysu.edu.cn)

DOI:10.3788/AOS230758

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology

Table 1. FID comparison of different text-to-image pre-trained models on MS-COCO dataset

Table 1. FID comparison of different text-to-image pre-trained models on MS-COCO dataset

Table 2. Performance comparison of different class-to-video generation methods on UCF-101 dataset and Sky Time-lapse dataset

Table 2. Performance comparison of different class-to-video generation methods on UCF-101 dataset and Sky Time-lapse dataset