[논문]YOLO, EAST: 신경망 모델을 이용한 문자열 위치 검출 성능 비교

박찬용; 임영민; 정승대; 조영혁; 이병철; 이규현; 김진욱

doi:10.3745/ktsde.2022.11.3.115

YOLO, EAST: 신경망 모델을 이용한 문자열 위치 검출 성능 비교
YOLO, EAST : Comparison of Scene Text Detection Performance, Using a Neural Network Model 원문보기

정보처리학회논문지. KIPS transactions on software and data engineering. 소프트웨어 및 데이터 공학, v.11 no.3, 2022년, pp.115 - 124

박찬용 ((주)투아트) , 임영민 ((주)투아트) , 정승대 ((주)투아트) , 조영혁 ((주)투아트) , 이병철 ((재)경상북도경제진흥원 일자리산업실) , 이규현 (경북대학교 컴뷰터공학부) , 김진욱 (경북대학교 컴뷰터공학부)

초록
AI-Helper

본 논문에서는 최근 다양한 분야에서 많이 활용되고 있는 YOLO와 EAST 신경망을 이미지 속 문자열 탐지문제에 적용해보고 이들의 성능을 비교분석 해 보았다. YOLO 신경망은 일반적으로 이미지 속 문자영역 탐지에 낮은 성능을 보인다고 알려졌으나, 실험결과 YOLOv3는 문자열 탐지에 비교적 약점을 보이지만 최근 출시된 YOLOv4와 YOLOv5의 경우 다양한 형태의 이미지 속에 있는 한글과 영문 문자열 탐지에 뛰어난 성능을 보여줌을 확인하였다. 따라서, 이들 YOLO 신경망 기반 문자열 탐지방법이 향후 문자 인식 분야에서 많이 활용될 것으로 전망한다.

Abstract ▼ AI-Helper

In this paper, YOLO and EAST models are tested to analyze their performance in text area detecting for real-world and normal text images. The earl ier YOLO models which include YOLOv3 have been known to underperform in detecting text areas for given images, but the recently released YOLOv4 and YOLOv5 achieved promising performances to detect text area included in various images. Experimental results show that both of YOLO v4 and v5 models are expected to be widely used for text detection in the filed of scene text recognition in the future.

주제어

표/그림 (26)

그림 Fig. 1. Examples of Scene Text Detection [3]
그림 Fig. 2. Introduction of SullivanPlus
그림 Fig. 3. SullivanPlus's Function Usage Frequency Comparison
그림 Fig. 4. Performance Versus Speed on ICDAR 2015 Text Localization Challenge [9]
그림 Fig. 5. Performance Comparison of YOLO and R-CNN [15]
그림 Fig. 6. Comparison of YOLOv4 and Other Object Detectors [17]
그림 Fig. 7. Comparison of YOLOv5 and Other Object Detectors [18]
표 Table 1. Pretrained Checkpoints of YOLOv5 Models [18]
그림 Fig. 8. Boxing Examples of RBOX and AABB
그림 Fig. 9. Examples of Training Data Set
그림 Fig. 10. Intersection Over Union (IOU). Red is Predicted Bounding Box and Blue is Ground Truth Bounding Box
그림 Fig. 11. Precision and Recall Explanation
표 Table 2. Evaluation Criteria Description
표 Table 3. EAST's Performance by Number of Training
표 Table 4. YOLOv4's Performance by Number of Training
표 Table 5. YOLOv5x's Performance by Number of Training
표 Table 6. English Text Detection Ratio After Training Eng. + Kor.
표 Table 7. Korean Text Detection Ratio After Training Eng. + Kor.
그림 Fig. 12. Results of Text Detection in Documents
그림 Fig. 13. Results of Scene Text Detection
표 Table 8. Ratio of Scene Text Detection After Training Eng. + Kor.
표 Table 9. Ratio of Scene Text Detection with Refined Data
그림 Fig. 14. Example of Accurate Data Labelling
그림 Fig. 15. Example of Inaccurate Data Labelling
그림 Fig. 16. Examples of False Detection
그림 Fig. 17. Demo Video of EAST and YOLO Neural Networks

참고문헌 (20)

Y. M. Baek, B. D. Lee, D. Y. Han, S. D. Yun, and H. S. Lee, "Character region awareness for text detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.9365-9374, 2019.
T. Wang, T. Zhu, L. Jin, C. Luo, X. Chen, Y. Wu, and M. Cai, "Decoupled attention network for text recognition," in Proceedings of the AAAI Conference on Artificial Intelligence, Vol.34, No.7, pp.12216-12224, 2019.
P. Lyu, C. Yao, W. Wu, S. Yan, and X. Bai, "Multi-oriented scene text detection via corner localization and region segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7553-7563, 2018.
Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, "Detecting text in natural image with connectionist text proposal network," in European Conference on Computer Vision, Springer, Cham, pp.56-72, 2016.
M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, "Textboxes: A fast text detector with a single deep neural network," in Thirty-first AAAI Conference on Artificial Intelligence, 2017.
M. Liao, B. Shi, and X. Bai, "Textboxes++: A single-shot oriented scene text detector," IEEE Transactions on Image Processing, Vol.27, No.8, pp.3676-3690, 2018.

상세보기
F. Jiang, Z. Hao, and X. Liu, "Deep scene text detection with connected component proposals," arXiv preprint arXiv:1708.05133, 2017.
Y. Jiang, et al., "R2cnn: rotational region cnn for orientation robust scene text detection," arXiv preprint arXiv:1706.09579, 2017.
X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, "East: an efficient and accurate scene text detector," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.5551-5560, 2017.
H. Hu, C. Zhang, Y. Luo, Y. Wang, J. Han, and E. Ding, "Wordsup: Exploiting word annotations for character based text detection," in Proceedings of the IEEE International Conference on Computer Vision, pp.4940-4949, 2017.
S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, "Textsnake: A flexible representation for detecting text of arbitrary shapes," in Proceedings of the European Conference on Computer Vision (ECCV), pp.20-36, 2018.
T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, and C. Sun, "An end-to-end textspotter with explicit alignment and attention," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.5020-5029, 2018.
P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai, "Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes," in Proceedings of the European Conference on Computer Vision (ECCV), pp.67-83, 2018.
P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li, "Single shot text detector with regional attention," in Proceedings of the IEEE International Conference on Computer Vision, pp.3047-3055, 2017.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.779-788, 2016.
S. Qin and R. Manduchi, "Cascaded segmentation-detection networks for word-level text spotting," in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol.1, pp.1275-1282, 2017.
A. Bochkovskiy, C. Y. Wang, and H. Y. M.. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
G. Jocher, K. Nishimura, T. Mineeva, R. Vilarino, GitHub repository [Internet], https://github.com/ultralytics/yolov5
J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
X. Wang, S. Zheng, C. Zhang, R. Li, and L. Gui, "R-YOLO: A real-time text detector for natural scenes with arbitrary rotation," Sensors, Vol.21, No.3, pp.888, 2021.

상세보기

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증