[논문]Video Generative Adversarial Networks: A Review

Aldausari, Nuha; Sowmya, Arcot; Marcus, Nadine; Mohammadi, Gelareh

doi:10.1145/3487891

Video Generative Adversarial Networks: A Review 원문보기

ACM computing surveys, v.55 no.2, 2023년, pp.1 - 25

Aldausari, Nuha (School of Computer Science and Engineering, University of New South Wales, Sydney, Australia) , Sowmya, Arcot (School of Computer Science and Engineering, University of New South Wales, Sydney, Australia) , Marcus, Nadine (School of Computer Science and Engineering, University of New South Wales, Sydney, Australia) , Mohammadi, Gelareh (School of Computer Science and Engineering, University of New South Wales, Sydney, Australia)

Abstract ▼ AI-Helper

With the increasing interest in the content creation field in multiple sectors such as media, education, and entertainment, there is an increased trend in the papers that use AI algorithms to generate content such as images, videos, audio, and text.Generative Adversarial Networks (GANs)is one of the promising models that synthesizes data samples that are similar to real data samples. While the variations of GANs models in general have been covered to some extent in several survey papers, to the best of our knowledge, this is the first paper that reviews the state-of-the-art video GANs models. This paper first categorizes GANs review papers into general GANs review papers, image GANs review papers, and special field GANs review papers such as anomaly detection, medical imaging, or cybersecurity. The paper then summarizes the main improvements in GANs that are not necessarily applied in the video domain in the first run but have been adopted in multiple video GANs variations. Then, a comprehensive review of video GANs models are provided under two main divisions based on existence of a condition. The conditional models are then further classified according to the provided condition into audio, text, video, and image. The paper concludes with the main challenges and limitations of the current video GANs models.

참고문헌 (136)

10.5555/2969033.2969125
Proceedings of the International Conference on Learning Representations (ICLR) Diederik P. K. 2014 P. K. Diederik and M. Welling. 2014. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR).
10.1007/978-3-030-22885-9_1
10.5555/3045390.3045555
T. Karras T. Aila S. Laine and J. Lehtinen. 2017. Progressive growing of GANs for improved quality stability and variation. arXiv preprint arXiv:1710.10196 .
10.5555/3157096.3157165
Hong, Yongjun, Hwang, Uiwon, Yoo, Jaeyoon, Yoon, Sungroh. How Generative Adversarial Networks and Their Variants Work : An Overview. ACM computing surveys, vol.52, no.1, 1-43.

상세보기
Jabbar, Abdul, Li, Xi, Omar, Bourahla. A Survey on Generative Adversarial Networks: Variants, Applications, and Training. ACM computing surveys, vol.54, no.8, 1-49.

상세보기
Archives of Computational Methods in Engineering Alqahtani H. 1 Applications of generative adversarial networks (GANs): An updated review H. Alqahtani, M. Kavakli-Thorne, and G. Kumar. Applications of generative adversarial networks (GANs): An updated review. Archives of Computational Methods in Engineering, pp. 1-28.
J. Gui Z. Sun Y. Wen D. Tao and J. Ye. 2020. A review on generative adversarial networks: Algorithms theory and applications. arXiv preprint arXiv:2001.06937 .
A. Radford L. Metz and S. Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 .
I. Goodfellow. 2016. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160 .
M. Mirza and S. Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 .
10.5555/3157096.3157340
10.5555/3305890.3305954
A. Brock J. Donahue and K. Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 .
10.1109/CVPR.2019.00453
10.1109/CVPR42600.2020.00813
International Conference on Machine Learning Zhang H. 7354 2019 H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena. 2019. Self-attention generative adversarial networks. In International Conference on Machine Learning, 2019, pp. 7354-7363.
M. Arjovsky S. Chintala and L. Bottou. 2017. Wasserstein GAN. arXiv preprint arXiv:1701.07875 .
Pan, Zhaoqing, Yu, Weijie, Yi, Xiaokai, Khan, Asifullah, Yuan, Feng, Zheng, Yuhui. Recent Progress on Generative Adversarial Networks (GANs): A Survey. IEEE access : practical research, open solutions, vol.7, 36322-36333.

상세보기
Creswell, Antonia, White, Tom, Dumoulin, Vincent, Arulkumaran, Kai, Sengupta, Biswa, Bharath, Anil A.. Generative Adversarial Networks: An Overview. IEEE signal processing magazine, vol.35, no.1, 53-65.

상세보기
10.1007/978-3-030-20912-4_24
Wang, Kunfeng, Gou, Chao, Duan, Yanjie, Lin, Yilun, Zheng, Xinhu, Wang, Fei-Yue. Generative adversarial networks: introduction and outlook. IEEE/CAA journal of automatica sinica, vol.4, no.4, 588-598.

상세보기
S. Hitawala. 2018. Comparative study on generative adversarial networks. arXiv preprint arXiv:1801.04271 .
Wang, Zhengwei, She, Qi, Ward, Tomás E.. Generative Adversarial Networks in Computer Vision : A Survey and Taxonomy. ACM computing surveys, vol.54, no.2, 1-38.

상세보기
K. Cheng R. Tahir L. K. Eric and M. Li. An analysis of generative adversarial networks and variants for image synthesis on MNIST dataset. Multimedia Tools and Applications pp. 1-28.
10.1145/3446374 D. Saxena and J. Cao. 2020. Generative adversarial networks (GANs): Challenges solutions and future directions. arXiv preprint arXiv:2005.00065 .

상세보기
Y. LeCun C. Cortes and C. Burges. 2010. MNIST handwritten digit database.
S. N. Esfahani and S. Latifi. A Survey of the State-of-the-Art GAN-based approaches to image synthesis.
H. Huang P. S. Yu and C. Wang. 2018. An introduction to image synthesis with generative adversarial nets. arXiv preprint arXiv:1803.04469 .
Wu, Xian, Xu, Kun, Hall, Peter. A survey of image synthesis and editing with generative adversarial networks. Tsinghua science and technology. = 淸華大淸華大學學報.自然科學版, vol.22, no.6, 660-674.

상세보기
Cao, Yang-Jie, Jia, Li-Li, Chen, Yong-Xia, Lin, Nan, Yang, Cong, Zhang, Bo, Liu, Zhi, Li, Xue-Xiang, Dai, Hong-Hua. Recent Advances of Generative Adversarial Networks in Computer Vision. IEEE access : practical research, open solutions, vol.7, 14985-15006.

상세보기
10.1002/widm.1345 J. Agnese J. Herrera H. Tao and X. Zhu. 2019. A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis. arXiv preprint arXiv:1910.09399 .

상세보기
H. Xiao K. Rasul and R. Vollgraf. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 .
Medical ImageAanalysis Yi X. 101552 2019 Generative adversarial network in medical imaging: A review X. Yi, E. Walia, and P. Babyn. 2019. Generative adversarial network in medical imaging: A review. Medical ImageAanalysis, p. 101552.
Sorin, Vera, Barash, Yiftach, Konen, Eli, Klang, Eyal. Creating Artificial Images for Radiology Applications Using Generative Adversarial Networks (GANs) – A Systematic Review. Academic radiology, vol.27, no.8, 1175-1185.

상세보기
F. Di Mattia P. Galeone M. De Simoni and E. Ghelfi. 2019. A survey on GANs for anomaly detection. arXiv preprint arXiv:1906.11632 .
International Journal of Computer Applications Torres-Reyes N. 8887 975 Audio enhancement and synthesis using generative adversarial networks: A survey N. Torres-Reyes and S. Latifi. Audio enhancement and synthesis using generative adversarial networks: A survey. International Journal of Computer Applications, vol. 975, p. 8887.
Artificial Intelligence Review Yinka-Banjo C. 1 2019 A review of generative adversarial networks and its application in cybersecurity C. Yinka-Banjo and O.-A. Ugot. 2019. A review of generative adversarial networks and its application in cybersecurity. Artificial Intelligence Review, pp. 1-16.
10.1109/CVPR.2019.00649
10.1109/CVPR.2018.00165
B. Duan W. Wang H. Tang H. Latapie and Y. Yan. 2019. Cascade attention guided residue learning GAN for Cross-Modal translation. arXiv preprint arXiv:1907.01826 .
10.5555/3504035.3504900
X. Sun H. Xu and K. Saenko. 2018. A two-stream variational adversarial network for video generation. arXiv preprint arXiv:1812.01037 .
IEEE Transactions on Geoscience and Remote Sensing Liu Q. 2020 PSGAN: A generative adversarial network for remote sensing image pan-sharpening Q. Liu, H. Zhou, Q. Xu, X. Liu, and Y. Wang. 2020. PSGAN: A generative adversarial network for remote sensing image pan-sharpening. IEEE Transactions on Geoscience and Remote Sensing.
IEEE Transactions on Geoscience and Remote Sensing Yu W. 2021 Convolutional two-stream generative adversarial network-based hyperspectral feature extraction W. Yu, M. Zhang, Z. He, and Y. Shen. 2021. Convolutional two-stream generative adversarial network-based hyperspectral feature extraction. IEEE Transactions on Geoscience and Remote Sensing.
A. Clark J. Donahue and K. Simonyan. 2019. Efficient video generation on complex datasets. arXiv preprint arXiv:1907.06571 .
10.1007/s11263-019-01251-8 K. Vougioukas S. Petridis and M. Pantic. 2018. End-to-end speech-driven facial animation with temporal GANs. arXiv preprint arXiv:1805.09313 .

상세보기
10.1109/WACV45572.2020.9093527
10.5555/3326943.3327049
10.5555/3326943.3327049
Q. Hu A. Waelchli T. Portenier M. Zwicker and P. Favaro. 2018. Video synthesis from a single image and motion stroke. arXiv preprint arXiv:1812.01874 .
10.1109/ICCV.2017.308
10.5555/3367243.3367316
10.5555/3504035.3504326
M. Saito and S. Saito. 2018. TGANv2: Efficient training of large models for video generation with multiple subsampling layers. arXiv preprint arXiv:1811.09245 .
10.1109/CVPR42600.2020.00531
10.1007/978-3-030-01234-2_32
Zhou, Hang, Liu, Yu, Liu, Ziwei, Luo, Ping, Wang, Xiaogang. Talking Face Generation by Adversarially Disentangled Audio-Visual Representation. Proceedings of the ... aaai conference on artificial intelligence, vol.33, 9299-9306.

상세보기
S. A. Jalalifar H. Hasani and H. Aghajan. 2018. Speech-driven facial reenactment using conditional generative adversarial networks. arXiv preprint arXiv:1803.07461 .
10.1145/3123266.3127905
10.1109/CVPR.2019.00385
10.1109/ICCV.2019.00916
10.1007/978-3-030-58517-4_31
M. Mathieu C. Couprie and Y. LeCun. 2015. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 .
A. X. Lee R. Zhang F. Ebert P. Abbeel C. Finn and S. Levine. 2018. Stochastic adversarial video prediction. arXiv preprint arXiv:1804.01523 .
10.1109/CVPR.2018.00251
10.1007/978-3-030-01216-8_23
10.5555/3294771.3294816
R. Villegas J. Yang S. Hong X. Lin and H. Lee. 2017. Decomposing motion and content for natural video sequence prediction. arXiv preprint arXiv:1706.08033 .
10.1109/ICCV.2017.361
10.1109/ICCV.2017.194
10.5555/3294996.3295195
10.1109/CVPR.2019.00824
10.1109/ICCV.2019.00125
10.1109/ICCV.2019.00603
10.1109/ICCVW.2019.00153 Y. Zhou Z. Wang C. Fang T. Bui and T. L. Berg. 2019. Dance dance generation: Motion transfer for internet videos. arXiv preprint arXiv:1904.00129 .
10.1109/CVPR42600.2020.00535
Kim, Hyeongwoo, Garrido, Pablo, Tewari, Ayush, Xu, Weipeng, Thies, Justus, Niessner, Matthias, Pérez, Patrick, Richardt, Christian, Zollhöfer, Michael, Theobalt, Christian. Deep video portraits. ACM transactions on graphics, vol.37, no.4, 1-14.

상세보기
10.1109/CVPR.2019.00248
Liu, Lingjie, Xu, Weipeng, Zollhöfer, Michael, Kim, Hyeongwoo, Bernard, Florian, Habermann, Marc, Wang, Wenping, Theobalt, Christian. Neural Rendering and Reenactment of Human Actor Videos. ACM transactions on graphics, vol.38, no.5, 1-14.

상세보기
O. Gafni L. Wolf and Y. Taigman. 2019. Vid2game: Controllable characters extracted from real-world videos. arXiv preprint arXiv:1904.08379 .
10.1007/978-3-030-01246-5_37
10.1109/CVPR46437.2021.00505
L. Li J. Bao H. Yang D. Chen and F. Wen. 2019. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 .
10.1007/978-3-030-01228-1_8
10.1109/ICCV.2017.244
10.1007/978-3-030-00928-1_60
10.1109/ICASSP.2018.8462614
10.1109/CVPR.2017.632
10.1109/CVPR.2016.278
10.1007/978-3-319-46475-6_43
10.1109/CVPR.2018.00917
K. Soomro A. R. Zamir and M. Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 .
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol.86, no.11, 2278-2324.

상세보기
10.5555/1018429.1020906
Y. Balaji M. R. Min B. Bai R. Chellappa and H. P. Graf. 2018. TFGAN: Improving conditioning for Text-to-Video synthesis.
Cooke, Martin, Barker, Jon, Cunningham, Stuart, Shao, Xu. An audio-visual corpus for speech perception and automatic speech recognition. The Journal of the Acoustical Society of America, vol.120, no.5, 2421-2424.

상세보기
Asian Conference on Computer Vision Chung J. S. 87 2016 J. S. Chung and A. Zisserman. 2016. Lip reading in the wild. In Asian Conference on Computer Vision, 2016: Springer, pp. 87-103.
Ionescu, Catalin, Papava, Dragos, Olaru, Vlad, Sminchisescu, Cristian. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE transactions on pattern analysis and machine intelligence, vol.36, no.7, 1325-1339.

상세보기
10.1109/ICCV.2005.28
International Conference on Information Technology and Applications (ICITA) Alqahtani H. 2019 H. Alqahtani, M. Kavakli-Thorne, G. Kumar, and F. SBSSTC. 2019. An analysis of evaluation metrics of GANs. In International Conference on Information Technology and Applications (ICITA).
10.5555/3157096.3157346
10.5555/3295222.3295408
10.1109/ACSSC.2003.1292216
Zhou Wang, Bovik, A.C.. Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures. IEEE signal processing magazine, vol.26, no.1, 98-117.

상세보기
Horé, Alain, Ziou, Djemel. Is there a relationship between peak-signal-to-noise ratio and structural similarity index measure?. IET image processing, vol.7, no.1, 12-24.

상세보기
10.5555/3042817.3043083
10.1145/3126686.3126723
10.5555/2586117.2587158
Harte, Naomi, Gillen, Eoin. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech. IEEE transactions on multimedia, vol.17, no.5, 603-615.

상세보기
10.5555/3172077.3172168
F. Ebert C. Finn A. X. Lee and S. Levine. 2017. Self-supervised visual planning with temporal skip connections. arXiv preprint arXiv:1710.05268 .
A. Rössler D. Cozzolino L. Verdoliva C. Riess J. Thies and M. Nießner. 2018. Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179 .
11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10 Aifanti N. 1 2010 N. Aifanti, C. Papachristou, and A. Delopoulos. 2010. The MUG facial expression database. In 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10, 2010: IEEE, pp. 1-4.
10.1109/CVPR.2014.223
A. Gorban et al. 2015. THUMOS challenge: Action recognition with a large number of classes. ed.
10.5555/1896300.1896315
10.1109/CVPR.2017.28
Audiovisual Database of Spoken American English Richie C. 2009 C. Richie, S. Warburton, and M. Carter. 2009. Audiovisual Database of Spoken American English. Linguistic Data Consortium.
10.1109/CVPR.2017.492
Houwei Cao, Cooper, David G., Keutmann, Michael K., Gur, Ruben C., Nenkova, Ani, Verma, Ragini. CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset. IEEE transactions on affective computing, vol.5, no.4, 377-390.

상세보기
T. Afouras J. S. Chung and A. Zisserman. 2018. LRS3-TED: A large-scale dataset for visual speech recognition. arXiv preprint arXiv:1809.00496 .
Geiger, A, Lenz, P, Stiller, C, Urtasun, R. Vision meets robotics: The KITTI dataset. The International journal of robotics research, vol.32, no.11, 1231-1237.

상세보기
Computer Vision and Pattern Recognition (CVPR) Schiele B. 2009 B. Schiele, P. Dollár, C. Wojek, and P. Perona. 2009. Pedestrian detection: A benchmark. In Computer Vision and Pattern Recognition (CVPR).
Dibeklioglu, H., Salah, A.A., Gevers, T.. Are You Really Smiling at Me? Spontaneous versus Posed Enjoyment Smiles. Lecture notes in computer science, vol.7574, 525-538.

상세보기
10.1109/ICCV.2017.243
10.1109/CVPR.2016.350
10.1109/CVPRW.2018.00141
10.1145/3123266.3123309
10.5555/2002472.2002497
N. Xu et al. 2018. Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327 .
S. Caelles et al. 2018. The 2018 Davis challenge on video object segmentation. arXiv preprint arXiv:1803.00557 .
videvo. “videvo.” https://www.videvo.net/(accessed 2021).
10.1109/CVPR.2009.5206557

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Video Generative Adversarial Networks: A Review 원문보기

Abstract ▼ AI-Helper

참고문헌 (136)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트