Acta Optica Sinica, Volume. 43, Issue 15, 1510002(2023)

From Perception to Creation: Exploring Frontier of Image and Video Generation Methods

Liang Lin and Binbin Yang*
Author Affiliations
  • School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510006, Guangdong, China
  • show less
    References(128)

    [1] Müller V C, Bostrom N. Future progress in artificial intelligence: a survey of expert opinion[M]. Müller V C. Fundamental issues of artificial intelligence. Synthese library, 376, 555-572(2016).

    [2] Došilović F K, Brčić M, Hlupić N. Explainable artificial intelligence: a survey[C], 210-215(2018).

    [3] Lu Y. Artificial intelligence: a survey on evolution, models, applications and future trends[J]. Journal of Management Analytics, 6, 1-29(2019).

    [4] Henry W P[M]. Artificial intelligence(1984).

    [5] Huang C X, Wang G R, Zhou Z B et al. Reward-adaptive reinforcement learning: dynamic policy gradient optimization for bipedal locomotion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 7686-7695(2023).

    [6] Huang C X, Zhang R H, Ouyang M Z et al. Deductive reinforcement learning for visual autonomous urban driving navigation[J]. IEEE Transactions on Neural Networks and Learning Systems, 32, 5379-5391(2021).

    [7] Wu J, Li G B, Han X G et al. Reinforcement learning for weakly supervised temporal grounding of natural language in untrimmed videos[C], 1283-1291(2020).

    [9] Garland M, le Grand S, Nickolls J et al. Parallel computing experiences with CUDA[J]. IEEE Micro, 28, 13-27(2008).

    [10] Kalaiselvi T, Sriramakrishnan P, Somasundaram K. Survey of using GPU CUDA programming model in medical image analysis[J]. Informatics in Medicine Unlocked, 9, 133-144(2017).

    [12] Abadi M. TensorFlow: learning functions at scale[C](2016).

    [13] Wang G R, Lin L, Chen R C et al. Joint learning of neural transfer and architecture adaptation for image recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 33, 5401-5415(2022).

    [14] Chen T S, Lin L, Chen R Q et al. Knowledge-guided multi-label few-shot learning for general image recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 1371-1384(2022).

    [15] Wang K Z, Zhang D Y, Li Y et al. Cost-effective active learning for deep image classification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 27, 2591-2600(2017).

    [16] Wang X L, Lin L, Huang L C et al. Incorporating structural alternatives and sharing into hierarchy for multiclass object recognition and detection[C], 3334-3341(2013).

    [17] Wang K Z, Lin L, Zuo W M et al. Dictionary pair classifier driven convolutional neural networks for object detection[C], 2138-2146(2016).

    [18] Wang K Z, Yan X P, Zhang D Y et al. Towards human-machine cooperation: self-supervised sample mining for object detection[C], 1605-1613(2018).

    [20] Xu H, Jiang C H, Liang X D et al. Reasoning-RCNN: unifying adaptive global reasoning into large-scale object detection[C], 6412-6421(2020).

    [21] Yang B B, Deng X C, Shi H et al. Continual object detection via prototypical task correlation guided gating mechanism[C], 9245-9254(2022).

    [23] Wu Y X, Zhang G W, Gao Y M et al. Bidirectional graph reasoning network for panoptic segmentation[C], 9077-9086(2020).

    [24] Yang J H, Xu R J, Li R Y et al. An adversarial perturbation oriented domain adaptation approach for semantic segmentation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12613-12620(2020).

    [25] Jordan M I, Mitchell T M. Machine learning: trends, perspectives, and prospects[J]. Science, 349, 255-260(2015).

    [26] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C], 770-778(2016).

    [28] Girshick R. Fast R-CNN[C], 1440-1448(2016).

    [29] Ren S Q, He K M, Girshick R B et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149(2015).

    [30] Strumbelj E, Kononenko I. An efficient explanation of individual classifications using game theory[J]. Journal of Machine Learning Research, 11, 1-18(2010).

    [31] Baehrens D, Schroeter T, Harmeling S et al. How to explain individual classification decisions[J]. Journal of Machine Learning Research, 11, 1803-1831(2010).

    [32] Oussidi A, Elhassouny A. Deep generative models: survey[C](2018).

    [33] Pan Z Q, Yu W J, Yi X K et al. Recent progress on generative adversarial networks (GANs): a survey[J]. IEEE Access, 7, 36322-36333(2019).

    [34] Harshvardhan G M, Gourisaria M K, Pandey M et al. A comprehensive survey and analysis of generative models in machine learning[J]. Computer Science Review, 38, 100285(2020).

    [37] Wang K F, Gou C, Duan Y J et al. Generative adversarial networks: introduction and outlook[J]. IEEE/CAA Journal of Automatica Sinica, 4, 588-598(2017).

    [38] He K M, Chen X L, Xie S N et al. Masked autoencoders are scalable vision learners[C], 15979-15988(2022).

    [39] Wang G R, Tang Y S, Lin L et al. Semantic-aware auto-encoders for self-supervised representation learning[C], 9654-9665(2022).

    [41] Tan Q Y, Liu N H, Huang X et al. S2GAE: self-supervised graph autoencoders are generalizable learners with graph masking[C], 787-795(2023).

    [46] Wei C, Fan H Q, Xie S N et al. Masked feature prediction for self-supervised visual pre-training[C], 14648-14658(2022).

    [47] He K M, Fan H Q, Wu Y X et al. Momentum contrast for unsupervised visual representation learning[C], 9726-9735(2020).

    [49] Chen X L, Xie S N, He K M. An empirical study of training self-supervised vision transformers[C], 9620-9629(2022).

    [50] Chen X L, He K M. Exploring simple Siamese representation learning[C], 15745-15753(2021).

    [52] Cheng Z Z, Yang Q X, Sheng B. Deep colorization[C], 415-423(2016).

    [53] Xiao Y, Zhou P Y, Zheng Y et al. Interactive deep colorization using simultaneous global and local inputs[C], 1887-1891(2019).

    [54] Richard Z, Phillip I, Efros A A. Colorful image colorization[M]. Leibe B, Matas J, Sebe N, et al. Computer vision–ECCV 2016. Lecture notes in computer science, 9907, 649-666(2016).

    [55] Larsson G, Maire M, Shakhnarovich G. Learning representations for automatic colorization[M]. Leibe B, Matas J, Sebe N, et al. Computer vision–ECCV 2016. Lecture notes in computer science, 9908, 577-593(2016).

    [57] He M M, Chen D D, Liao J et al. Deep exemplar-based colorization[J]. ACM Transactions on Graphics, 37, 1-16(2018).

    [58] Ledig C, Theis L, Huszár F et al. Photo-realistic single image super-resolution using a generative adversarial network[C], 105-114(2017).

    [59] Sharif S M A, Ali Naqvi R, Ali F et al. DarkDeblur: learning single-shot image deblurring in low-light condition[J]. Expert Systems With Applications, 222, 119739(2023).

    [60] Li B C, Li X, Lu Y T et al. Hst: hierarchical swin transformer for compressed image super-resolution[M]. Karlinsky L, Michaeli T, Nishino K. Computer vision–ECCV 2022 workshops. Lecture notes in computer science, 13802(2022).

    [61] Liang J Y, Cao J Z, Sun G L et al. SwinIR: image restoration using swin transformer[C], 1833-1844(2021).

    [62] Zamir S W, Arora A, Khan S et al. Multi-stage progressive image restoration[C], 14816-14826(2021).

    [63] Yang F Z, Yang H, Fu J L et al. Learning texture transformer network for image super-resolution[C], 5790-5799(2020).

    [64] Dai T, Cai J R, Zhang Y B et al. Second-order attention network for single image super-resolution[C], 11057-11066(2020).

    [66] Cui R P, Cao Z, Pan W S et al. Deep gesture video generation with learning on regions of interest[J]. IEEE Transactions on Multimedia, 22, 2551-2563(2020).

    [67] Saunders B, Camgoz N C, Bowden R. Anonysign: novel human appearance synthesis for sign language video anonymisation[C](2022).

    [68] Natarajan B, Elakkiya R. Dynamic GAN for high-quality sign language video generation from skeletal poses using generative adversarial networks[J]. Soft Computing, 26, 13153-13175(2022).

    [69] Ferstl Y, Neff M, McDonnell R. Multi-objective adversarial gesture generation[C](2019).

    [70] Zeng D, Liu H, Lin H et al. Talking face generation with expression-tailored generative adversarial network[C], 1716-1724(2020).

    [71] Zhou H, Liu Y, Liu Z W et al. Talking face generation by adversarially disentangled audio-visual representation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 9299-9306(2019).

    [73] Zhou H, Sun Y S, Wu W et al. Pose-controllable talking face generation by implicitly modularized audio-visual representation[C], 4174-4184(2021).

    [74] Zeng D, Zhao S T, Zhang J J et al. Expression-tailored talking face generation with adaptive cross-modal weighting[J]. Neurocomputing, 511, 117-130(2022).

    [75] Chen L L, Maddox R K, Duan Z Y et al. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss[C], 7824-7833(2020).

    [76] Zhang Z M, Li L C, Ding Y et al. Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset[C], 3660-3669(2021).

    [77] Mildenhall B, Srinivasan P P, Tancik M et al. NeRF: representing scenes as neural radiance fields for view synthesis[M]. Vedaldi A, Bischof H, Brox T, et al. Computer vision–ECCV 2020. Lecture notes in computer science, 12346, 405-421(2020).

    [78] Park K, Sinha U, Barron J T et al. Nerfies: deformable neural radiance fields[C], 5845-5854(2022).

    [79] Niemeyer M, Geiger A. GIRAFFE: representing scenes as compositional generative neural feature fields[C], 11448-11459(2021).

    [80] Pumarola A, Corona E, Pons-Moll G et al. D-NeRF: neural radiance fields for dynamic scenes[C], 10313-10322(2021).

    [81] Martin-Brualla R, Radwan N, Sajjadi M S M et al. NeRF in the wild: neural radiance fields for unconstrained photo collections[C], 7206-7215(2021).

    [82] Chan E R, Monteiro M, Kellnhofer P et al. Pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis[C], 5795-5805(2021).

    [83] Chan E R, Lin C Z, Chan M A et al. Efficient geometry-aware 3D generative adversarial networks[C], 16102-16112(2022).

    [84] Li Z Q, Niklaus S, Snavely N et al. Neural scene flow fields for space-time view synthesis of dynamic scenes[C], 6494-6504(2021).

    [85] Oechsle M, Peng S Y, Geiger A. UNISURF: unifying neural implicit surfaces and radiance fields for multi-view reconstruction[C], 5569-5579(2022).

    [97] Rombach R, Blattmann A, Lorenz D et al. High-resolution image synthesis with latent diffusion models[C], 10674-10685(2022).

    [105] Avrahami O, Lischinski D, Fried O. Blended diffusion for text-driven editing of natural images[C], 18187-18197(2022).

    [110] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[M]. Navab N, Hornegger J, Wells W M, et al. Medical image computing and computer-assisted intervention–MICCAI 2015. Lecture notes in computer science, 9351, 234-241(2015).

    [117] Lin T Y, Maire M, Belongie S et al. Microsoft coco: common objects in context[M]. Fleet D, Pajdla T, Schiele B, et al. Computer vision–ECCV 2014. Lecture notes in computer science, 8693, 740-755(2014).

    [122] Xiong W, Luo W H, Ma L et al. Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks[C], 2364-2373(2018).

    [125] Tulyakov S, Liu M Y, Yang X D et al. MoCoGAN: decomposing motion and content for video generation[C], 1526-1535(2018).

    Tools

    Get Citation

    Copy Citation Text

    Liang Lin, Binbin Yang. From Perception to Creation: Exploring Frontier of Image and Video Generation Methods[J]. Acta Optica Sinica, 2023, 43(15): 1510002

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Image Processing

    Received: Mar. 30, 2023

    Accepted: Jul. 22, 2023

    Published Online: Aug. 15, 2023

    The Author Email: Yang Binbin (yangbb3@mail2.sysu.edu.cn)

    DOI:10.3788/AOS230758

    Topics