$\require{mediawiki-texvc}$

연합인증

연합인증 가입 기관의 연구자들은 소속기관의 인증정보(ID와 암호)를 이용해 다른 대학, 연구기관, 서비스 공급자의 다양한 온라인 자원과 연구 데이터를 이용할 수 있습니다.

이는 여행자가 자국에서 발행 받은 여권으로 세계 각국을 자유롭게 여행할 수 있는 것과 같습니다.

연합인증으로 이용이 가능한 서비스는 NTIS, DataON, Edison, Kafe, Webinar 등이 있습니다.

한번의 인증절차만으로 연합인증 가입 서비스에 추가 로그인 없이 이용이 가능합니다.

다만, 연합인증을 위해서는 최초 1회만 인증 절차가 필요합니다. (회원이 아닐 경우 회원 가입이 필요합니다.)

연합인증 절차는 다음과 같습니다.

최초이용시에는
ScienceON에 로그인 → 연합인증 서비스 접속 → 로그인 (본인 확인 또는 회원가입) → 서비스 이용

그 이후에는
ScienceON 로그인 → 연합인증 서비스 접속 → 서비스 이용

연합인증을 활용하시면 KISTI가 제공하는 다양한 서비스를 편리하게 이용하실 수 있습니다.

[해외논문] A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective 원문보기

IEEE transactions on knowledge and data engineering, v.33 no.4, 2021년, pp.1328 - 1347  

Roh, Yuji (Korea Advanced Institute of Science and Technology, School of Electrical Engineering, Daejeon, Korea) ,  Heo, Geon (Korea Advanced Institute of Science and Technology, School of Electrical Engineering, Daejeon, Korea) ,  Whang, Steven Euijong (Korea Advanced Institute of Science and Technology, School of Electrical Engineering, Daejeon, Korea)

Abstract AI-Helper 아이콘AI-Helper

Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not nece...

참고문헌 (187)

  1. Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P.. SMOTE: Synthetic Minority Over-sampling Technique. The journal of artificial intelligence research, vol.16, 321-357.

  2. Sinno Jialin Pan, Qiang Yang. A Survey on Transfer Learning. IEEE transactions on knowledge and data engineering, vol.22, no.10, 1345-1359.

  3. Proc 3rd Int Conf Learn Representations Explaining and harnessing adversarial examples goodfellow 2015 

  4. Tensorflow hub 0 

  5. Weiss, Karl, Khoshgoftaar, Taghi M., Wang, DingDing. A survey of transfer learning. Journal of big data, vol.3, 9-.

  6. 10.18653/v1/N19-5004 

  7. IEEE Trans Pattern Anal Mach Intell One-shot learning of object categories li 2006 10.1109/TPAMI.2006.79 28 594 

  8. Day, Oscar, Khoshgoftaar, Taghi M.. A survey on heterogeneous transfer learning. Journal of big data, vol.4, 29-.

  9. Proc 27th Int Conf Neural Inf Process Syst How transferable are features in deep neural networks? yosinski 2014 3320 

  10. 10.1109/ICCV.2015.168 

  11. Haibo He, Garcia, E.A.. Learning from Imbalanced Data. IEEE transactions on knowledge and data engineering, vol.21, no.9, 1263-1284.

  12. Shah, Vraj, Kumar, Arun, Zhu, Xiaojin. Are key-foreign key joins safe to avoid when learning high-capacity classifiers?. Proceedings of the VLDB Endowment, vol.11, no.3, 366-379.

  13. 10.1145/2882903.2882952 

  14. 10.1145/2213836.2213912 

  15. Dalvi, Nilesh, Kumar, Ravi, Soliman, Mohamed. Automatic wrappers for large scale web extraction. Proceedings of the VLDB Endowment, vol.4, no.4, 219-230.

  16. 10.1145/2213836.2213848 

  17. Cafarella, Michael J., Halevy, Alon, Khoussainova, Nodira. Data integration for the relational web. Proceedings of the VLDB Endowment, vol.2, no.1, 1090-1101.

  18. J Mach Learn Res Latent dirichlet allocation blei 2003 3 993 

  19. 10.3115/v1/D14-1162 

  20. Proc 26th Int Conf Neural Inf Process Syst Distributed representations of words and phrases and their compositionality mikolov 2013 3111 

  21. 10.1007/978-1-4614-8265-9_1154 

  22. Frenay, Benoit, Verleysen, Michel. Classification in the Presence of Label Noise: A Survey. IEEE transactions on neural networks and learning systems, vol.25, no.5, 845-869.

  23. Tensorflow data validation 0 

  24. 10.1145/2939672.2939778 

  25. 10.1007/978-3-030-01424-7_27 

  26. Proc 3rd Int Conf Learn Representations Very deep convolutional networks for large-scale image recognition simonyan 2015 

  27. Proc 25th Int Conf Neural Inf Process Syst Imagenet classification with deep convolutional neural networks krizhevsky 2012 1106 

  28. 10.1145/3178876.3186133 

  29. Proc 32nd AAAI Conf Artif Intell Anchors: High-precision model-agnostic explanations ribeiro 2018 1527 

  30. 10.1145/2723372.2723725 

  31. Elmeleegy, Hazem, Madhavan, Jayant, Halevy, Alon. Harvesting relational tables from lists on the web. The VLDB journal : very large data bases : a publication of the VLDB Endowment, vol.20, no.2, 209-226.

  32. 10.1145/3097983.3098021 

  33. IEEE Data Eng Bull Data services leveraging bing's data assets chakrabarti 2016 39 15 

  34. 10.1145/2882903.2903730 

  35. Proc Biennial Conf Innovative Data Syst Res The data civilizer system deng 2017 

  36. 10.1145/3035918.3058740 

  37. Cafarella, Michael J., Halevy, Alon, Wang, Daisy Zhe, Wu, Eugene, Zhang, Yang. WebTables : exploring the power of tables on the web. Proceedings of the VLDB Endowment, vol.1, no.1, 538-549.

  38. 10.1145/3183713.3183746 

  39. Proc 20th Int Conf Int Conf Mach Learn Semi-supervised learning using gaussian fields and harmonic functions zhu 2003 912 

  40. Google dataset search 0 

  41. Proc Int Conf Artif Intell Statistics Large scale distributed semi-supervised learning using streaming approximation ravi 2016 519 

  42. Cafarella, Michael, Halevy, Alon, Lee, Hongrae, Madhavan, Jayant, Yu, Cong, Wang, Daisy Zhe, Wu, Eugene. Ten years of webtables. Proceedings of the VLDB Endowment, vol.11, no.12, 2140-2149.

  43. Proc Biennial Conf Innovative Data Syst Res Crowdsourced databases: Query processing with people marcus 2011 211 

  44. 10.1109/ICDE.2012.122 

  45. 10.1145/2939502.2939515 

  46. Zhou, Zhi-Hua. A brief introduction to weakly supervised learning. National science review, vol.5, no.1, 44-53.

  47. 10.3115/1690219.1690287 

  48. 10.1145/3035918.3056442 

  49. Schaekermann, Mike, Goh, Joslin, Larson, Kate, Law, Edith. Resolvable vs. Irresolvable Disagreement : A Study on Worker Deliberation in Crowd Work. Proceedings of the acm on human-computer interaction, vol.2, no.no.cscw, 1-19.

  50. Proc Biennial Conf Innovative Data Syst Res The role of massively multi-task and weak supervision in software 2.0 ratner 2019 

  51. Weak supervision: The new programming paradigm for machine learning 0 

  52. 10.1109/CVPR.2009.5206848 

  53. 10.1007/978-1-4899-7637-6 

  54. Gu, Y., Jin, Z., Chiu, S.C.. Combining Active Learning and Semi-supervised Learning Using Local and Global Consistency. Lecture notes in computer science, vol.8834, 215-222.

  55. Crescenzi, Valter, Merialdo, Paolo, Qiu, Disheng. Crowdsourcing large scale wrapper inference. Distributed and parallel databases : an international journal, vol.33, no.1, 95-122.

  56. Park, Noseong, Mohammadi, Mahmoud, Gorde, Kshitij, Jajodia, Sushil, Park, Hongkyu, Kim, Youngmin. Data synthesis based on generative adversarial networks. Proceedings of the VLDB Endowment, vol.11, no.10, 1071-1083.

  57. Proc 2nd Mach Learn Healthcare Conf Generating multi-label discrete patient records using generative adversarial networks choi 2017 286 

  58. Proc Int Conf Neural Inf Process Generative adversarial nets goodfellow 2014 2672 

  59. 10.1145/2588555.2588576 

  60. Proc Biennial Conf Innovative Data Syst Res Data curation at scale: The data tamer system stonebraker 2013 

  61. Crescenzi, Valter, Merialdo, Paolo, Qiu, Disheng. Crowdsourcing large scale wrapper inference. Distributed and parallel databases : an international journal, vol.33, no.1, 95-122.

  62. Proc ACM SIGMOD Int Conf Manage Data Crowdfill: Collecting structured data from the crowd park 2014 577 

  63. Proc IEEE Int Conf Data Eng Crowdsourced enumeration queries franklin 2013 673 

  64. IEEE Data Eng Bull Data integration: The current status and the way forward stonebraker 2018 41 3 

  65. Proc IEEE Conf Comput Vis Pattern Recognit Learning from massive noisy labeled data for image classification xiao 2015 2691 

  66. 10.1145/3329486.3329493 

  67. 10.1145/1993498.1993536 

  68. 10.1145/2254556.2254659 

  69. 10.1145/1978942.1979444 

  70. Proc 27th Int Conf Very Large Data Bases Potter's wheel: An interactive data cleaning system raman 2001 381 

  71. Dolatshah, Mohamad, Teoh, Mathew, Wang, Jiannan, Pei, Jian. Cleaning crowdsourced labels using oracles for statistical classification. Proceedings of the VLDB Endowment, vol.12, no.4, 376-389.

  72. CoRR Boostclean: Automated error detection and repair for machine learning krishnan 2017 abs 1711 1299 

  73. Polyzotis, Neoklis, Roy, Sudip, Whang, Steven Euijong, Zinkevich, Martin. Data Lifecycle Challenges in Production Machine Learning : A Survey. SIGMOD record, vol.47, no.2, 17-28.

  74. Proc 34th Int Conf Mach Learn Learning the structure of generative models without labeled data bach 2017 273 

  75. Google cloud automl 0 

  76. 10.1145/3035918.3054782 

  77. Krishnan, Sanjay, Wang, Jiannan, Wu, Eugene, Franklin, Michael J., Goldberg, Ken. ActiveClean : interactive data cleaning for statistical modeling. Proceedings of the VLDB Endowment, vol.9, no.12, 948-959.

  78. Amazon sagemaker 0 

  79. 10.1145/1989323.1989331 

  80. Microsoft custom vision 0 

  81. Ferrara, E., De Meo, P., Fiumara, G., Baumgartner, R.. Web data extraction, applications and techniques: A survey. Knowledge-based systems, vol.70, 301-323.

  82. Rekatsinas, Theodoros, Chu, Xu, Ilyas, Ihab F., Ré, Christopher. HoloClean : holistic data repairs with probabilistic inference. Proceedings of the VLDB Endowment, vol.10, no.11, 1190-1201.

  83. Bhardwaj, Anant, Deshpande, Amol, Elmore, Aaron J., Karger, David, Madden, Sam, Parameswaran, Aditya, Subramanyam, Harihar, Wu, Eugene, Zhang, Rebecca. Collaborative data analytics with DataHub. Proceedings of the VLDB Endowment, vol.8, no.12, 1916-1919.

  84. 10.1145/2384616.2384663 

  85. 10.1145/1866029.1866040 

  86. Park, Hyunjung, Garcia-Molina, Hector, Pang, Richard, Polyzotis, Neoklis, Parameswaran, Aditya, Widom, Jennifer. Deco : a system for declarative crowdsourcing. Proceedings of the VLDB Endowment, vol.5, no.12, 1990-1993.

  87. 10.1145/2047196.2047203 

  88. 10.1145/3299869.3319878 

  89. Principles of Data Integration doan 2012 

  90. 10.1145/2882903.2882952 

  91. Chen, Lingjiao, Kumar, Arun, Naughton, Jeffrey, Patel, Jignesh M.. Towards linear algebra over normalized data. Proceedings of the VLDB Endowment, vol.10, no.11, 1214-1225.

  92. Ratner, Alexander, Bach, Stephen H., Ehrenberg, Henry, Fries, Jason, Wu, Sen, Ré, Christopher. Snorkel : rapid training data creation with weak supervision. Proceedings of the VLDB Endowment, vol.11, no.3, 269-282.

  93. Proc Conf Neural Inf Process Syst Data programming: Creating large training sets, quickly ratner 2016 3567 

  94. Proc Workshop Human-In-the-Loop Data Analytics Data programming with ddlite: Putting humans in a different part of the loop ehrenberg 2016 10.1145/2939502.2939515 

  95. Deepdive: A data management system for automatic knowledge base construction zhang 2015 

  96. 10.18653/v1/N18-1170 

  97. 10.18653/v1/E17-1083 

  98. 10.1145/3329486.3329492 

  99. 10.1145/2733373.2806243 

  100. 10.1145/3299869.3314036 

  101. Xia, Y., Cao, X., Wen, F., Sun, J.. Well Begun Is Half Done: Generating High-Quality Seeds for Automatic Image Dataset Construction from Web. Lecture notes in computer science, vol.8692, 387-400.

  102. IEEE Data Eng Bulletin Keyword search in relational databases: A survey yu 2010 33 67 

  103. 10.1145/3209889.3209898 

  104. Chaudhuri, Surajit, Das, Gautam. Keyword querying and ranking in databases. Proceedings of the VLDB Endowment, vol.2, no.2, 1658-1659.

  105. 10.18653/v1/P18-1079 

  106. IEEE Data Eng Bulletin Managing google's data lake: An overview of the goods system halevy 2016 39 5 

  107. Proc Biennial Conf Innovative Data Syst Res YAGO3: A knowledge base from multilingual wikipedias mahdisoltani 2015 

  108. Proc Joint Conf Empirical Methods Natural Language Process Comput Natural Language Learn Open language learning for information extraction schmitz 2012 523 

  109. 10.1145/1376616.1376746 

  110. Proc IEEE 34th Int Conf Data Eng Aurum: A data discovery system fernandez 2018 1001 

  111. 10.1145/1242572.1242667 

  112. Proc IEEE 34th Int Conf Data Eng Seeping semantics: Linking datasets using word embeddings for data discovery fernandez 2018 989 

  113. 10.3115/v1/D14-1038 

  114. 10.1109/ICSE.2013.6606627 

  115. Gupta, Rahul, Halevy, Alon, Wang, Xuezhi, Whang, Steven Euijong, Wu, Fei. Biperpedia : an ontology for search applications. Proceedings of the VLDB Endowment, vol.7, no.7, 505-516.

  116. 10.1145/2623330.2623623 

  117. CoRR Synthesizing tabular data using generative adversarial networks xu 2018 abs 1811 11264 

  118. Proc 27th Int Conf Very Large Data Bases Roadrunner: Towards automatic data extraction from large web sites crescenzi 2001 109 

  119. CoRR The GAN landscape: Losses, architectures, regularization, and normalization kurach 2018 abs 1807 4720 

  120. CoRR NIPS 2016 tutorial: Generative adversarial networks goodfellow 2017 abs 1701 160 

  121. Proc IEEE Conf Comput Vis Pattern Recognit Autoaugment: Learning augmentation policies from data cubuk 2019 113 

  122. Proc Int Conf Neural Inf Process Learning to compose domain-specific transformations for data augmentation ratner 2017 3239 

  123. 10.1145/988672.988687 

  124. 10.1109/ICCV.2015.151 

  125. Proc Association Advancement Artif Intell Never-ending learning mitchell 2015 2302 

  126. 10.1109/ICRA.2017.7989232 

  127. Proc 24th AAAI Conf Artif Intell Toward an architecture for never-ending language learning carlson 2010 1306 

  128. 10.1007/978-3-030-01225-0_39 

  129. Semi-supervised learning literature survey zhu 2008 

  130. CoRR Synthetic data and artificial neural networks for natural scene text recognition jaderberg 2014 abs 1406 2227 

  131. UCI machine learning repository dheeru 2017 

  132. Deep Learning goodfellow 2016 

  133. 10.1109/CVPR.2016.254 

  134. Multiple-Valued Logic Soft Comput Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework alcal-fdez 2010 17 255 

  135. Deep learning for detection of diabetic eye disease 0 

  136. Proc 18th Int Conf Mach Learn Toward optimal active learning through sampling estimation of error reduction roy 2001 441 

  137. 10.1145/279943.279962 

  138. Proc 20th Int Conf Neural Inf Process Syst Multiple-instance active learning settles 2007 1289 

  139. 10.1109/ICTAI.2004.48 

  140. 10.3115/1613715.1613855 

  141. Zhou, Zhi-Hua, Li, Ming. Tri-training: exploiting unlabeled data using three classifiers. IEEE transactions on knowledge and data engineering, vol.17, no.11, 1529-1541.

  142. Proc 15th Int Conf Mach Learn Query learning strategies using boosting and bagging abe 1998 1 

  143. 10.3115/981658.981684 

  144. Active Learning settles 2012 10.1007/978-3-031-01560-1 

  145. J Mach Learn Res Scikit-learn: Machine learning in python pedregosa 2011 12 2825 

  146. 10.1145/130385.130417 

  147. 10.1109/DSAA.2016.49 

  148. 10.1007/978-1-4471-2099-5_1 

  149. Proc Int Conf Artif Intell Statistics Scaling graph-based semi supervised learning to large number of labels using count-min sketch talukdar 2014 940 

  150. Proc 15th Int Conf Mach Learn Employing em and pool-based active learning for text classification mccallum 1998 350 

  151. 10.3115/1690219.1690291 

  152. Burbidge, R., Rowland, J.J., King, R.D.. Active Learning for Regression Based on Query by Committee. Lecture notes in computer science, vol.4881, 209-218.

  153. Proc ICML Workshop Learn Multiple Views A co-regularized approach to semi-supervised learning with multiple views sindhwani 2005 

  154. Proc 19th Int Joint Conf Artif Intell Semi-supervised regression with co-training zhou 2005 908 

  155. Triguero, Isaac, García, Salvador, Herrera, Francisco. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and information systems, vol.42, no.2, 245-284.

  156. 10.1145/1143844.1143862 

  157. Proc Biennial Conf Innovative Data Syst Res Datahub: Collaborative data science & dataset version management at scale bhardwaj 2015 

  158. Bhattacherjee, Souvik, Chavan, Amit, Huang, Silu, Deshpande, Amol, Parameswaran, Aditya. Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff. Proceedings of the VLDB Endowment, vol.8, no.12, 1346-1357.

  159. Proc Biennial Conf Innovative Data Syst Res Data publishing and sharing using fusion tables halevy 2013 

  160. 10.1145/1807128.1807158 

  161. 10.1145/1807167.1807286 

  162. Ckan 0 

  163. Wang, Jiannan, Kraska, Tim, Franklin, Michael J., Feng, Jianhua. CrowdER : crowdsourcing entity resolution. Proceedings of the VLDB Endowment, vol.5, no.11, 1483-1494.

  164. Quandl 0 

  165. Allahbakhsh, M., Benatallah, B., Ignjatovic, A., Motahari-Nezhad, H. R., Bertino, E., Dustdar, S.. Quality Control in Crowdsourcing Systems: Issues and Directions. IEEE internet computing, vol.17, no.2, 76-81.

  166. 10.1145/2556288.2557238 

  167. Datamarket 0 

  168. Marcus, Adam, Parameswaran, Aditya. Crowdsourced Data Management: Industry and Academic Perspectives. Foundations and trends® in databases, vol.6, no.1, 1-161.

  169. Kaggle 0 

  170. Li, Guoliang, Wang, Jiannan, Zheng, Yudian, Franklin, Michael J.. Crowdsourced Data Management: A Survey. IEEE transactions on knowledge and data engineering, vol.28, no.9, 2296-2319.

  171. 10.1145/1401890.1401965 

  172. Proc Biennial Conf Innovative Data Syst Res Data wrangling: The challenging yourney from the wild to the lake terrizzano 2015 

  173. Daniel, Florian, Kucherbaev, Pavel, Cappiello, Cinzia, Benatallah, Boualem, Allahbakhsh, Mohammad. Quality Control in Crowdsourcing : A Survey of Quality Attributes, Assessment Techniques, and Assurance Actions. ACM computing surveys, vol.51, no.1, 1-40.

  174. Proc ICML Workshop Continuum Labeled Unlabeled Data Mach Learn Data Mining Combining active learning and semi-supervised learning using gaussian fields and harmonic functions zhu 2003 58 

  175. Zhou, Z.-H., Chen, K.-J., Jiang, Y.. Exploiting Unlabeled Data in Content-Based Image Retrieval. Lecture notes in computer science, vol.3201, 525-536.

  176. Mozafari, Barzan, Sarkar, Purna, Franklin, Michael, Jordan, Michael, Madden, Samuel. Scaling up crowd-sourcing to very large datasets : a case for active learning. Proceedings of the VLDB Endowment, vol.8, no.2, 125-136.

  177. Proc 31st Int Conf Int Conf Mach Learn Distributed representations of sentences and documents le 2014 1188 

  178. 10.1145/3025453.3026044 

  179. 10.1145/2213836.2213878 

  180. Amsterdamer, Yael, Milo, Tova. Foundations of Crowd Data Sourcing. SIGMOD record, vol.43, no.4, 5-14.

  181. Proc 24th Int Conf Neural Inf Process Syst Iterative learning for reliable crowdsourcing systems karger 2011 1953 

  182. Proc 22nd Annu Conf Learn Theory Vox populi: Collecting high-quality labels from a crowd dekel 2009 

  183. Marcus, Adam, Karger, David, Madden, Samuel, Miller, Robert, Oh, Sewoong. Counting with the crowd. Proceedings of the VLDB Endowment, vol.6, no.2, 109-120.

  184. Amazon mechanical turk 0 

  185. 10.1145/2998181.2998196 

  186. 10.1145/2998181.2998332 

  187. Garcia-Molina, Hector, Joglekar, Manas, Marcus, Adam, Parameswaran, Aditya, Verroios, Vasilis. Challenges in Data Crowdsourcing. IEEE transactions on knowledge and data engineering, vol.28, no.4, 901-911.

LOADING...

활용도 분석정보

상세보기
다운로드
내보내기

활용도 Top5 논문

해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

관련 콘텐츠

오픈액세스(OA) 유형

GREEN

저자가 공개 리포지터리에 출판본, post-print, 또는 pre-print를 셀프 아카이빙 하여 자유로운 이용이 가능한 논문

유발과제정보 저작권 관리 안내
섹션별 컨텐츠 바로가기

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

AI-Helper 아이콘
AI-Helper
안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

선택된 텍스트

맨위로