[특허]Thompson strategy based online reinforcement learning system for action selection

Thompson strategy based online reinforcement learning system for action selection 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06N-005/04 G06N-007/00 G06N-007/02
출원번호	UP-0169503 (2005-06-29)
등록번호	US-7707131 (2010-05-20)
발명자 / 주소	Chickering, David M. Paek, Timothy S. Horvitz, Eric J.
출원인 / 주소	Microsoft Corporation
대리인 / 주소	Lee & Hayes, PLLC
인용정보	피인용 횟수 : 14 인용 특허 : 67

초록 ▼

A system and method for online reinforcement learning is provided. In particular, a method for performing the explore-vs.-exploit tradeoff is provided. Although the method is heuristic, it can be applied in a principled manner while simultaneously learning the parameters and/or structure of the model (e.g., Bayesian network model). The system includes a model which receives an input (e.g., from a user) and provides a probability distribution associated with uncertainty regarding parameters of the model to a decision engine. The decision engine can determine whether to exploit the information known to it or to explore to obtain additional information based, at least in part, upon the explore-vs.-exploit tradeoff (e.g., Thompson strategy). A reinforcement learning component can obtain additional information (e.g., feedback from a user) and update parameter(s) and/or the structure of the model. The system can be employed in scenarios in which an influence diagram is used to make repeated decisions and maximization of long-term expected utility is desired.

대표청구항 ▼

What is claimed is: 1. An online reinforcement learning system comprising components embodied on a computer readable storage medium, the components when executed by one or more processors, update a model based upon reinforcement learning, the components comprising: a model comprising an influence diagram with at least one chance node, the model receiving an input and providing a probability distribution associated with uncertainty regarding parameters of the model; a decision engine that selects an action based, at least in part, upon the probability distribution, the decision engine employing a Thompson strategy heuristic technique to maximize long term expected utility when selecting the action, wherein the decision engine decreases a variance of a distribution of the parameters as a last decision instance is approached; and a computer-implemented reinforcement learning component that modifies at least one of the parameters of the model based upon feedback associated with the selected action, the parameters defining distributions over discrete variables and continuous variables, uncertainty of the parameters expressed using Dirichlet priors for conditional distributions of discrete variables of the model, and, Normal-Wishart priors for distributions of continuous variables of the model, wherein the modified model is stored. 2. The system of claim 1, used when the parameters of the model are changing over time. 3. The system of claim 1, wherein the decision engine employs a maximum a posterior of the parameters when there is only one more decision instance remaining. 4. The system of claim 1, wherein the decision engine artificially increases the variance of a distribution of the parameters. 5. The system of claim 1, wherein the computer-implemented reinforcement learning component further modifies the structure of the model based, at least in part, upon the feedback associated with the selected action. 6. The system of claim 1, wherein the feedback comprises an input from a user of the system. 7. The system of claim 6, wherein the input from the user comprises a verbal utterance. 8. The system of claim 1, wherein the feedback comprises a lack of an input from a user of the system in a threshold period of time. 9. The system of claim 1, where one or more parameters of the model change over a period of time. 10. The system of claim 1, the parameters defining distributions over variables, where the variables comprise chance variables, decision variables and/or value variables. 11. The system of claim 1, employed repeatedly to facilitate decision making. 12. The system of claim 11, wherein the parameter(s) are updated prior to a next repetition. 13. The system of claim 1, the model comprising a Markov decision process represented as an Influence diagram. 14. The system of claim 1 employed as part of a dialog system. 15. An online reinforcement learning method comprising: determining a probability distribution associated with uncertainty regarding parameters of a model, the model comprising an influence diagram with at least one chance node; employing a computer-implemented Thompson strategy heuristic technique to select an action based, at least in part, upon the probability distribution, wherein a variance of a distribution of the parameters is artificially increased to be large enough that the model continues to adapt; updating at least one parameter of the model based, at least in part, upon feedback associated with the selected action, the parameters defining distributions over discrete variables and continuous variables, uncertainty of the parameters expressed using Dirichlet priors for conditional distributions of discrete variables of the model, and, Normal-Wishart priors for distributions of continuous variables of the model; and storing the updated model on a computer readable storage medium. 16. The method of claim 15, wherein the feedback comprises an input from a user or a lack of an input from the user in a threshold period of time. 17. A computer readable medium having stored thereon computer executable instructions for carrying out the method of claim 15.

이 특허에 인용된 특허 (67)

Hakkani Tur,Dilek Z.; Rahim,Mazin G.; Riccardi,Giuseppe; Tur,Gokhan, Active learning process for spoken dialog systems.
상세보기
Rouquie Gilbert J. A. (Redwood City CA), Communication between prolog and an external process.
상세보기
Bellegarda Jerome R. (Goldens Bridge NY) Kanevsky Dimitri (Ossining NY), Computer program product for automatic recognition of a consistent message using multiple complimentary sources of infor.
상세보기
Robarts, James O.; Matteson, Eric L., Contextual responses based on automated learning techniques.
상세보기
Chang, Daniel T.; Cheng, Josephine M.; Chow, Jyh-Herng; Xu, Jian, Database extender for storing, querying, and retrieving structured documents.
상세보기
Gaines R. Stockton, Distributed and portable execution environment.
상세보기
Hussey Peter, Electronic mail interface for a network server.
상세보기
Zinky, John A.; Schantz, Richard R.; Bakken, David E.; Loyall, Joseph P., Framework for providing quality of service requirements in a distributed object-oriented computer system.
상세보기
Borgida Alexander Tiberiu ; Brachman Ronald Jay ; Kirk Thomas ; Selfridge Peter Gilman ; Terveen Loren Gilbert, Interactive data analysis employing a knowledge base.
상세보기
Underwood, Roy Aaron, Interfacing servers in a Java based e-commerce architecture.
상세보기
Abbott, Kenneth H.; Newell, Dan; Robarts, James O.; Freedman, Joshua M., Managing interactions between computer users' context models.
상세보기
Abbott, Kenneth H.; Newell, Dan; Robarts, James O.; Freedman, Joshua M.; Apacible, Johnson, Mediating conflicts in computer user's context data.
상세보기
Nelson Michael N. (San Carlos CA) Khalidi Yousef A. (Sunnyvale CA), Method and apparatus for a caching file server.
상세보기
Chen Steve S. (Eau Claire WI) Beard Douglas R. (Eleva WI) Spix George A. (Eau Claire WI) Priest Edward C. (Eau Claire WI) Wastlick John M. (Eau Claire WI) VanDyke James M. (Eau Claire WI), Method and apparatus for a unified parallel processing architecture.
상세보기
Dion David ; Khalidi Yousef A. ; Talluri Madhusudhan ; Swaroop Anil, Method and apparatus for file system disaster recovery.
상세보기
Khalidi Yousef A. ; Talluri Madhusudhan ; Dion David ; Swaroop Anil, Method and apparatus for file system disaster recovery.
상세보기
Fast Ronald Wayne (Bellevue WA), Method and apparatus for generating database queries from a meta-query pattern.
상세보기
Digalakis Vassilios,GRX ; Neumeyer Leonardo ; Rtischev Dimitry, Method and apparatus for speech recognition adapted to an individual speaker.
상세보기
Hamilton Graham ; Powell Michael L. ; Mitchell James G. ; Gibbons Jonathan J., Method and apparatus for subcontracts in distributed processing systems.
상세보기
Dan Newell ; Kenneth H. Abbott, III, Method and system for controlling presentation of information to a user based on the user's condition.
상세보기
Combs, Charles; Gold, Jeffrey; Mair, Brian; Pedersen, David; Schear, David, Method and system for load-balanced data exchange in distributed network-based resource allocation.
상세보기
Kampe,Mark A.; Herrmann,Frederic; Nguyen,Gia Khanh; Shokri,Eltefaat H., Method and system for managing high-availability-aware components in a networked computer system.
상세보기
Friedman Marc T. ; Kwok Chung T. ; Weld Daniel S., Method and system for network information access.
상세보기
Kolfman, Michael, Method and system for storing and retrieving documents.
상세보기
Kenton, Stephen J., Method and system for straight through processing.
상세보기
Brereton JoAnn Piersa ; Coden Anna Rosa ; Schwartz Michael Stephen, Method and system for translating an ad-hoc query language using common table expressions.
상세보기
O'Neill Maureen K., Method for compilation using a database for target language independence.
상세보기
Matsuzawa Hirofumi,JPX ; Fukuda Takeshi,JPX, Method for executing aggregate queries, and computer system.
상세보기
Khalidi Yousef A. (Sunnyvale CA) Hamilton Graham (Palo Alto CA) Kougiouris Panagiotis S. (Mountain View CA), Method for executing operation call from client application using shared memory region and establishing shared memory re.
상세보기
Theimer Marvin M. (Mountain View CA) Spreitzer Michael J. (Tracy CA) Weiser Mark D. (Palo Alto CA) Goldstein Richard J. (San Francisco CA) Elrod Scott A. (Redwood City CA) Swinehart Daniel C. (Palo A, Method for granting a user request having locational and contextual attributes consistent with user policies for devices.
상세보기
Gillis, Herbert R., Method for retrieving semantically distant analogies.
상세보기
Theimer Marvin M. (Mountain View CA) Spreitzer Michael J. (Tracy CA) Weiser Mark D. (Palo Alto CA) Goldstein Richard J. (San Francisco CA) Elrod Scott A. (Redwood City CA) Swinehart Daniel C. (Palo A, Method for selectively performing event on computer controlled device whose location and allowable operation is consiste.
상세보기
Theimer Marvin M. (Mountain View CA) Spreitzer Michael J. (Tracy CA) Weiser Mark D. (Palo Alto CA) Goldstein Richard J. (San Francisco CA) Elrod Scott A. (Redwood City CA) Swinehart Daniel C. (Palo A, Method for triggering selected machine event when the triggering properties of the system are met and the triggering con.
상세보기
Yifan Gong, Method of adapting speech recognition models for speaker, microphone, and noisy environment.
상세보기
Leung, Ting Yu; Urata, Monica Sachiye; Vora, Swati, Method of simplifying and optimizing scalar subqueries and derived tables that return exactly or at most one tuple.
상세보기
Miller, Paul Andrew; Benedyk, Robby Darren; Ravishankar, Venkataramaiah; Marsico, Peter Joseph, Methods and systems for providing database node access control functionality in a communications network routing node.
상세보기
Baskey,Michael Edward; Brabson,Roy Frank; Huynh,Lap Thiet; Yocom,Peter Bergersen, Methods, systems and computer program products for server based type of service classification of a communication request.
상세보기
Katzman James A. (San Jose CA) Bartlett Joel F. (Palo Alto CA) Bixler Richard M. (Sunnyvale CA) Davidow William H. (Atherton CA) Despotakis John A. (Pleasanton CA) Graziano Peter J. (Los Altos CA) Gr, Multiprocessor system.
상세보기
Lippmann Wouter J. H. M. (Eindhoven NLX) Kessels Jozef L. W. (Eindhoven NLX) Eggenhuisen Huibert H. (Eindhoven NLX) Dijkstra Hendrik (Eindhoven NLX), Multiprocessor system comprising a plurality of data processors which are interconnected by a communication network.
상세보기
Hamilton Graham (Palo Alto CA) Powell Michael L. (Palo Alto CA) Mitchell James G. (Los Altos CA) Gibbons Jonathan J. (Mountain View CA), Object oriented system for executing application call by using plurality of client-side subcontract mechanism associated.
상세보기
Ellacott Bruce A.,CAX, Object-oriented query mechanism.
상세보기
Nunez, Chris, Open format for file storage system indexing, searching and data retrieval.
상세보기
Srivastava Divesh ; Stuckey Peter J.,AUX ; Sudarshan Sundararajarao,INX, Optimization of queries using relational algebraic theta-semijoin operator.
상세보기
Meredith,L. Gregory; Bjorg,Steve; Richter,David, Permutation nuances of the integration of processes and queries as processes at queues.
상세보기
Raitto John ; Ziauddin Mohamed ; Finnerty James, Rewriting a query in terms of a summary based on aggregate computability and canonical format, and when a dimension tabl.
상세보기
Theimer Marvin M. (Mountain View CA) Spreitzer Michael J. (Tracy CA) Weiser Mark D. (Palo Alto CA) Goldstein Richard J. (San Francisco CA) Terry Douglas B. (San Carlos CA) Schilit William N. (Palo Al, Selective delivery of electronic messages in a multiple computer system based on context and environment of a user.
상세보기
Simor Gabor (Barrington IL), Self-configuration of nodes in a distributed message-based operating system.
상세보기
Goronzy, Silke; Kompe, Ralf; Buchner, Peter; Iwahashi, Naoto, Semi-supervised speaker adaptation.
상세보기
Theimer Marvin M. ; Spreitzer Michael J. ; Weiser Mark D. ; Goldstein Richard J. ; Swinehart Daniel C. ; Schilit William N. ; Want Roy, Specifying and establishing communication data paths between particular media devices in multiple media device computing.
상세보기
Junqua Jean-Claude, Speech recognition and teaching apparatus able to rapidly adapt to difficult speech of children and foreign speakers.
상세보기
Abbott, III, Kenneth H.; Newell, Dan; Robarts, James O., Storing and recalling information to augment human memories.
상세보기
Abbott, III, Kenneth H.; Newell, Dan; Robarts, James O., Storing and recalling information to augment human memories.
상세보기
Meredith,L. Gregory; Bjorg,Steve; Richter,David, Structural equivalence of expressions containing processes and queries.
상세보기
Abbott, Kenneth H.; Newell, Dan; Robarts, James O.; Swapp, Ken, Supplying enhanced computer user's context data.
상세보기
Abbott, Kenneth H.; Freedman, Joshua M.; Newell, Dan; Robarts, James O., Supplying notifications related to supply and consumption of user context data.
상세보기
Warwick, Alan M.; Naik, Dilip C., System and method for accessing information made available by a kernel mode driver.
상세보기
Marx Matthew T. ; Carter Jerry K. ; Phillips Michael S. ; Holthouse Mark A. ; Seabury Stephen D. ; Elizondo-Cecenas Jose L. ; Phaneuf Brett D., System and method for developing interactive speech applications.
상세보기
Mani, Murali; Sundaresan, Neelakantan, System and method for query processing and optimization for XML repositories.
상세보기
Faybishenko, Yaroslav; Kan, Gene H.; Camarda, Thomas J.; Botros, Sherif; Beatty, John; Cutting, Douglass R., System and method for resolving distributed network search queries to information providers.
상세보기
Theimer Marvin M. (Mountain View CA) Spreitzer Michael J. (Tracy CA) Weiser Mark D. (Palo Alto CA) Goldstein Richard J. (San Francisco CA) Elrod Scott A. (Redwood City CA) Swinehart Daniel C. (Palo A, System for granting ownership of device by user based on requested level of ownership, present state of the device, and.
상세보기
Conner Mike H. (Austin TX) Martin Andrew R. (Austin TX) Raper Larry K. (Austin TX), System for producing language neutral objects and generating an interface between the objects and multiple computer lang.
상세보기
Pyreddy Pallavi ; Croft W. Bruce, Systems and methods for retrieving tabular data from textual sources.
상세보기
Arimilli, Ravi Kumar; Dodson, John Steven; Fields, Jr., James Stephen, Two-stage request protocol for accessing remote memory data in a NUMA data processing system.
상세보기
Chen, Shyh-Kwei; Lo, Ming-Ling, Universal output constructor for XML queries universal output constructor for XML queries.
상세보기
Bishop, Christopher; Winn, John; Spiegelhalter, David J., Variational inference engine for probabilistic graphical models.
상세보기
Meredith, Lucius Gregory, XML-based representation of mobile process calculi.
상세보기
Zintel, William M.; Gandhi, Amar S.; Gu, Ye; Pather, Shyamalan; Schlimmer, Jeffrey C.; Rude, Christopher M.; Weisman, Daniel R.; Ryan, Donald R.; Leach, Paul J.; Cai, Ting; Knight, Holly N.; Ford, Pe, XML-based template language for devices and services.
상세보기

이 특허를 인용한 특허 (14)

London, Justin, Adaptive virtual intelligent agent.
상세보기
Francis, James Covosso, Deciding whether a received signal is a signal of interest.
상세보기
Paek, Timothy S.; Chickering, David M., Dialog repair based on discrepancies between user model predictions and speech recognition results.
상세보기
Markovic, Ivan, Enhanced process query framework.
상세보기
Zappella, Giovanni; Archambeau, Cedric Philippe Charles Jean Ghislain, Optimized selection and delivery of content.
상세보기
Tomkins, Andrew; Ravikumar, Shanmugasundaram; Agarwal, Shalini; Yang, MyLinh; Pang, Bo; Li, Mark Yinan, Providing additional information related to a vague term in a message.
상세보기
Williams, Jason D., System and method for generating manually designed and automatically optimized spoken dialog systems.
상세보기
Williams, Jason D., System and method for generating manually designed and automatically optimized spoken dialog systems.
상세보기
Williams, Jason D., System and method for generating manually designed and automatically optimized spoken dialog systems.
상세보기
Badger, Eric Norman; Linerud, Drew Elliott; Almog, Itai; Paek, Timothy S.; Sundararajan, Parthasarathy; Walters, Kenneth R.; Peterson, Andrew Douglas; Davis, Shawna Julie; Sengupta, Tirthankar, Typing assistance for editing.
상세보기
Badger, Eric Norman; Linerud, Drew Elliott; Almog, Itai; Paek, Timothy S.; Sundararajan, Parthasarathy; Walters, Kenneth R.; Peterson, Andrew Douglas; Davis, Shawna Julie; Sengupta, Tirthankar, Typing assistance for editing.
상세보기
Badger, Eric Norman; Linerud, Drew Elliot; Almog, Itai; Paek, Timothy S.; Sundararajan, Parthasarathy; Rudchenko, Dmytro; Gunawardana, Asela J., User-centric soft keyboard predictive technologies.
상세보기
Badger, Eric Norman; Linerud, Drew Elliott; Almog, Itai; Paek, Timothy S.; Sundararajan, Parthasarathy; Rudchenko, Dmytro; Gunawardana, Asela J., User-centric soft keyboard predictive technologies.
상세보기
Badger, Eric Norman; Linerud, Drew Elliott; Almog, Itai; Paek, Timothy S.; Sundararajan, Parthasarathy; Rudchenko, Dmytro; Gunawardana, Asela J., User-centric soft keyboard predictive technologies.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Thompson strategy based online reinforcement learning system for action selection 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (67)

이 특허를 인용한 특허 (14)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Thompson strategy based online reinforcement learning system for action selection 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (67)

이 특허를 인용한 특허 (14)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트