최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기정보과학회논문지 = Journal of KIISE, v.44 no.1, 2017년, pp.63 - 70
최상희 (서강대학교 컴퓨터공학과) , 장형수 (서강대학교 컴퓨터공학과)
This paper considers the problem of combining multiple strategies for solving sleeping bandit problems with stochastic rewards and stochastic availability. It also proposes an algorithm, called sleepComb(
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, "The nonstochastic multiarmed bandit problems," SIAM J. Comput., Vol. 32, No. 1, pp. 48- 77, 2002.
A. Blum and Y. Monsour, "From external to internal regret," Conf. on Learning Theory(COLT), pp. 621-636, 2005.
S. Bubeck and N. Cesa-Bianchi, "Regret analysis of stochastic and nonstochastic multi-armed bandit problems," Foundations and Trends in Machine Learning, Vol. 5, No. 1, pp. 1-122, 2012.
D. P. de Farias and N. Megiddo, "Combining expert advice in reactive environments," J. of the ACM, Vol. 53, No. 5, pp. 762-799, 2006.
H. S. Chang and S. H. Choe, "Combining multiple strategies for multi-armed bandits problems and asymptotic optimality," Journal of Control Science and Engineering, Vol. 2015, Article ID 264953, 7 pages, Mar. 2015.
Y. Freund, R. E. Schapire, Y. Singer, and M. K. Warmuth, "Using and combining predictors that specialize," Proc. of the 22nd annual ACM symp. on Theory of comput., pp. 334-343, 1997.
J. C. Gittins and D. M. Jones, "A dynamic allocation index for sequential design of experiments," Progress in Statistics, Euro. Meet. Statis., Vol. 1, pp. 241-266, 1972.
V. Kanade, B. McMahan, and B. Bryan, "Sleeping experts and bandits with stochastic action availability and adversarial rewards," Inter. Conf. on Art. Int. and Stat., pp. 272-279, 2009.
R. D. Kleinberg, A. Niculescu-Mizil, and Y. Sharma, "Regret bounds for sleeping experts and bandits," Machine learning, Vol. 80, No. 2-3, pp. 245-272, 2010.
T. L. Lai and Herbert Robbins, "Asymptotically efficient adaptive allocations rules," Adv. in appl. Math., Vol. 6, pp. 4-22, 1985.
H. B. McMahan and M. Streeter, "Tighter bounds for multi-armed bandits with expert advice," Proc. of the 22nd Conf. on Learning Theory, 2009.
G. Neu and M. Valko, "Online combinatorial optimization with stochastic decision sets and adversarial losses," Advances in Neural Information Processing Systems, pp. 2780-2788, 2014.
H. Robbins, "Some aspects of the sequential design of experiments," Bull. Amer. Math. Soc., Vol. 58, pp. 527-535, 1952.
*원문 PDF 파일 및 링크정보가 존재하지 않을 경우 KISTI DDS 시스템에서 제공하는 원문복사서비스를 사용할 수 있습니다.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.