Semi-supervised learning uses a small amount of labeled data to predict labels of unlabeled data as well as to improve clustering performance, whereas unsupervised learning analyzes only unlabeled data for clustering purpose. We propose a new clustering-based semi-supervised learning method by reflecting the initial predicted labels of unlabeled data on the objective function. The initial prediction should be done in terms of a discrete probability distribution through a classification method using labeled data. As a result, clusters are formed and labels of unlabeled data are predicted according to the Information of labeled data in the same cluster. We evaluate and compare the performance of the proposed method in terms of classification errors through numerical experiments with blinded labeled data.
Bar-Hillel, A., T. hertz, N. Shental, and D. Weinshall, Learning distance functions using equivalence relations. Proceedings of 20th International Conference on Machine Learning, Washington, USA, 2003, pp.11-18.
Bilenko, M., S. Basu, and R. Mooney, Integrating constraints and metric learning in semisupervised clustering. Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004, pp.81-88.
Tan, P.N., M. Steinbach, and V.Kumar, Introduction to Data Mining, Pearson Education, Boston, 2006.
Demiriz, A., K. Bennett, and M. Embrechts, Semi-Supervised clustering using genetic algorithms. Intelligent Engineering Systems, Vol.9(1999), pp.809-814.
Xing, E.P., A.Y. Ng, M.I. Jordan, and S. Russell, Distance metric learning, with application to clustering with side information. Advances in Neural Information Processing Systems, Vol. 15(2003), pp.505-512.
Bouchachia, A. and W. pedrycz, Data clustering with partial supervision. Data Mining and Knowledge Discovery, Vol.12, No.1(2006), pp. 47-78.
Chapelle, O. and A. Zien, Semi-supervised classification by low density separation, Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, 2005, pp. 57-64.
Klein, D., S.D. Kamvar, and C. Manning, From instance-level constraints to space-level constraints : Making the most of prior knowledge in data clustering. Proceedings of the 19th International Conference on Machine Learning, 2002, pp.307-314.
Dempster, A.P., N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, Vol.39(1977), pp.1-38.
Wagstaff, K., C. Cardie, S. Rogers, and S. Schroedl, Constrained K-means clustering with background knowledge. Proceedings of the 18th International Conference on Machine Learning, Massachusetts, USA, 2001, pp.577-584.
Zhu, X.Semi-supervised learning literature survey, Computer Sciences TR 1530, University of Wisconsin-Madison. http://www.cs.wisc. edu/-jerryzhu/pub/s sl_survey.pdf, 2007.
Cozman, F., I. Cohen, and M. Cirelo, Semi- Supervised learning of mixture models. Proceedings of the 20th International Conference on Machine Learning, 2003, pp.99-106.
Lee, D. and J. Lee, Equilibrium-based support vector machine for semi-supervised classification, IEEE Trans. on Neural Networks, Vol.18, No.2(2007), pp.578-583.
Nigam, K., A. McCallum, S. Thrun, and T. Mitchell, Text classification from labeled and unlabeled documents using EM, Machine Learning, Vol.39(2000), pp.103-134.
Basu, S., A. Banerjee, and R. Mooney, Semisupervised clustering by seeding. Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, 2002, pp. 19-26.