[특허]Automated action-selection system and method, and application thereof to training prediction machines and driving the development of self-developing devices

Automated action-selection system and method, and application thereof to training prediction machines and driving the development of self-developing devices 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06N-005/00
출원번호	UP-0658683 (2005-07-26)
등록번호	US-7672913 (2010-04-21)
우선권정보	EP-04291912(2004-07-27)
국제출원번호	PCT/EP2005/008724 (2005-07-26)
§371/§102 date	20070129 (20070129)
국제공개번호	WO06/010645 (2006-02-02)
발명자 / 주소	Kaplan, Frederic Oudeyer, Pierre-Yves
출원인 / 주소	Sony France S.A.
대리인 / 주소	Oblon, Spivak, McClelland, Maier & Neustadt, L.L.P.
인용정보	피인용 횟수 : 10 인용 특허 : 3

초록 ▼

In order to promote efficient learning of relationships inherent in a system or setup S described by system-state and context parameters, the next action to take, affecting the setup, is determined based on the knowledge gain expected to result from this action. Knowledge-gain is assessed “locally” by comparing the value of a knowledge-indicator parameter after the action with the value of this indicator on one or more previous occasions when the system-state/context parameter(s) and action variable(s) had similar values to the current ones. Preferably the “level of knowledge” is assessed based on the accuracy of predictions made by a prediction module. This technique can be applied to train a prediction machine by causing it to participate in the selection of a sequence of actions. This technique can also be applied for managing development of a self-developing device or system, the self-developing device or system performing a sequence of actions selected according to the action-selection technique.

대표청구항 ▼

The invention claimed is: 1. An automated action-selection system adapted to generate signals specifying values for a set of one or more action variables defining an action that can be taken whereby to affect a setup S, the automated action-selection system comprising: input means for receiving signals indicative of the value, at a time t, of a set of zero or more system-state/context parameters (SC(t)) describing the state and/or context of the setup S; a region definer adapted to define a set of regions in a multi-dimensional system-state/context/action space, each dimension of the system-state/context/action space being defined by a respective different parameter or variable of the sets of system-state/context parameters and action variables; means for determining a set of candidate actions, each candidate action consisting of a possible set of values for the action variables; a region identifier for identifying the region in system-state/context/action space containing the combination of a given candidate action with values of any system-state/context parameters at time t; a prediction unit adapted to predict the value of a set of one or more predicted variables (VAR) a predetermined interval after time t, wherein a prediction function applied by the prediction unit depends upon the region in system-state/context/action space containing the combination of this given candidate action with any system-state/context parameters at time t; calculator means adapted to calculate, for selected candidate actions, a respective indicator of the actual error in the prediction made by the prediction unit for said selected candidate action; memory means for storing indicators of actual prediction errors made by the prediction unit for respective candidate actions selected on one or more previous occasions; assessment means adapted to evaluate the expected improvement in the performance of the prediction unit if a given candidate action is performed, wherein an assessment performed by the assessment means depends upon the region R in system-state/context/action space containing the combination of this given candidate action with the values, at time t, of any system-state/context parameters, and the assessment means is further adapted to evaluate said expected improvement by comparing an indicator of the actual prediction error that existed on one or more occasions, previous to time t, when the setup S had a combination of system-state/context parameters and action variables located in the same region R of the system-state/context/action; and means for generating a signal indicating the desirability of selecting a given candidate action for performance, said signal being dependent on the expected improvement in the performance of the prediction unit evaluated by the assessment unit for said given candidate action. 2. The automated action-selection system according to claim 1, and comprising an action selector adapted to select an action for performance to affect the setup S, the action selector having a probability of p, where 0<p<1, of selecting that one of the set of candidate actions that the assessment unit evaluates to be expected to yield the greatest improvement in performance of the prediction unit; wherein the action-selection system outputs data defining the action selected by the action selector. 3. The automated action-selection system according to claim 2, wherein the action selector has a probability of 1-p of selecting a random action for performance. 4. The automated action-selection system according to claim 1, wherein the region definer is adapted to define the regions in system- state/context/action space dynamically, wherein the region divider divides an existing region R into two or more new regions when a first criterion (C1) is met. 5. The automated action-selection system according to claim 4, and comprising counting means for counting the number of occasions on which an action is taken affecting the setup S and the combination of the action variable values defining said action with the values of system-state/context parameter values at the time said action is taken, falls within the region R; wherein the first criterion (C1) is met when the counting means has counted up to a predetermined number (NS). 6. The automated action-selection system according to claim 1, and comprising meta prediction means adapted to evaluate the expected prediction error for predictions made by the prediction means; wherein the assessment means is adapted to evaluate the expected improvement in the performance of the prediction means if a given candidate action is performed by evaluating the decrease in prediction error that is expected to result from performance of said given candidate action. 7. The automated action-selection system according to claim 1, and comprising means for receiving feedback regarding the actual values of the predicted variables (VAR) resulting from performance of a given action; the prediction means is responsive to the feedback data whereby to adapt the prediction function that is applied for candidate actions in the same region of system-state/context/action space as the region containing the combination of the action which produced the feedback data and values of any system-state/context parameters at the time when said given action was performed. 8. The automated action-selection system according to claim 7, and comprising memory means for storing training example data; for a given performed action said training example data comprising action variable values, any system-state/context parameter values applicable at the time said given action was performed, and feedback data defining the actual values of the set of predicted variables (VAR). 9. The automated action-selection system according to claim 8, wherein the region definer is adapted dynamically to define regions in system-state/context/action space and, when dividing a region R of system-state/context/action space into two or more new regions defines boundaries of the new regions so that there is a balance between the numbers of training examples in each new region, and the variance of the training examples is minimized in system-state/context/action space, or in a space defined by the set of predicted variables, or in a multi-dimensional space combining system-state/context/action space and said space defined by the set of predicted variables. 10. The automated action-selection system according to claim 1, wherein the prediction unit is adapted to make predictions by applying nearest-neighbours algorithms. 11. An automated prediction-machine-training system comprising an automated action-selection system according to claim 1, wherein the combination of the region identifier and the prediction unit of the automated action-selection system constitute a trainable prediction machine; the automated prediction-machine-training system comprising means for receiving feedback regarding the actual values of the predicted variables (VAR) resulting from performance of a given action; and wherein the prediction means is responsive to the feedback data whereby to adapt the prediction function that is applied for candidate actions in the same region of system- state/context/action space as the region containing the combination of the action which produced the feedback data and any system-state/context parameters applicable at the time when said given action was performed. 12. The automated prediction-machine-training system according to claim 11, and comprising: monitoring means for monitoring the evolution of the respective prediction functions applied by the prediction unit for the different regions of system-state/context/action space; and an operation-mode setter adapted to change over operation of the prediction-machine-training system from a training mode to a prediction mode when the monitoring means determines that the rate of change of the prediction functions has fallen below a threshold level. 13. A prediction machine trained using the automated prediction-machine-training system of claim 11. 14. A computer system adapted to predict the value of a set of one or more predicted variables (VAR), said computer system having been trained using the automated prediction-machine-training system of claim 11. 15. An expert system adapted to predict the value of a set of one or more predicted variables (VAR), said computer system having been trained using the automated prediction-machine-training system of claim 11. 16. An automated action-selection system for a robot or other self-developing device or system, the action-selection system being according to any claim 1, wherein the robot or other self-developing device or system is arranged to perform actions affecting the setup S. 17. An automated action-selection system for a self-developing robot or other self-developing device or system, according to claim 16, and comprising means for supplying the signal indicating the desirability of selecting a given candidate action for performance to a decision-making unit which selects the actions to be performed by the self-developing device or system. 18. An automated action-selection system for a self-developing robot or other self-developing device or system, according to claim 16, wherein the action-selection system is separate from, but in communication with the self-developing device or system. 19. A self-developing robot, or other self-developing device or system, comprising an automated action-selection system according to claim 16. 20. A self-developing robot, or other self-developing device or system, trained by having been caused to participate in the selection and performance of a series of actions, actions in the series having been selected using the automated action-selection system of any claim 16. 21. An automated action-selection method making use of an automated action-selection system according to claim 1, the automated action-selection method comprising the steps of: providing the automated action-selection system; and inputting to the action-selection system signals indicative of the value, at a time t, of a set of zero or more system-state/context parameters (SC(t)) describing the state and/or context of the setup S. 22. The automated action-selection method according to claim 21, and comprising the step of feeding back to the action-selection system data indicative of the actual values of the predicted variables (VAR) resulting from performance of a given action, and data indicative of the applicable values of any system-state/context parameters at the time said given action was performed, wherein the prediction means is responsive to the feedback data whereby to adapt the prediction function that is applied for candidate actions in the same region of system-state/context/action space as the region containing the combination of the action which produced the feedback data and any system-state/context parameters applicable at the time when said given action was performed. 23. An automated prediction-machine-training method comprising the automated action-selection method according to claim 22, wherein the combination of the region identifier and the prediction unit of the action-selection system constitute a trainable prediction machine. 24. An automated action-selection method for a robot or other self-developing device or system, the automated action-selection method being according to claim 21, wherein the robot or other self-developing device or system is arranged to perform actions affecting the setup S. 25. An automated action-selection method for a robot or other self-developing device or system, according to claim 24, and comprising the step of supplying the signal indicating the desirability of selecting a given candidate action for performance to a decision-making unit which selects the actions to be performed by the self-developing device or system.

이 특허에 인용된 특허 (3)

Cook, Donald A.; Lukas, George; Lukas, Andrew V.; Padwa, David J., Agent based instruction system and method.
상세보기
Hogan, Michael Andrew, Modular, hierarchically organized artificial intelligence entity.
상세보기
Mark E Plutowski, System for combining plurality of input control policies to provide a compositional output control policy.
상세보기

이 특허를 인용한 특허 (10)

Waseen, Daniel; Johnson, Bruce S.; Weber, Larry; Wacker, Paul, Actuator having a test mode.
상세보기
Grabinger, Cory; McMillan, Scott; Carlson, Nathan; Waseen, Daniel; McNallan, Torrey William, Actuator having an address selector.
상세보기
Grabinger, Cory; McNallan, Torrey William; Waseen, Daniel; Thomle, Adrienne; McMillan, Scott, Actuator having an adjustable auxiliary output.
상세보기
Grabinger, Cory; Thomle, Adrienne; McNallan, Torrey William; Waseen, Daniel, Actuator having an adjustable running time.
상세보기
Bokusky, Mark David; Sibilski, Robert; Jasiczek, Liliana, Actuator power control circuit having fail-safe bypass switching.
상세보기
Grabinger, Cory; McNallan, Torrey William; Waseen, Daniel; Thomle, Adrienne; McMillan, Scott, Actuator with diagnostics.
상세보기
Grabinger, Cory; McNallan, Torrey William; Waseen, Daniel; Thomle, Adrienne; McMillan, Scott, Actuator with diagnostics.
상세보기
McNallan, Torrey William; Waseen, Daniel; Wacker, Paul, Dual potentiometer address and direction selection for an actuator.
상세보기
Benaim, Carlos; Reiner, Miriam, Natural machine interface system.
상세보기
Bartholomew, John C.; Bokusky, Mark D.; Moeller, Steffen J., Power supply compensation for an actuator.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Automated action-selection system and method, and application thereof to training prediction machines and driving the development of self-developing devices 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (3)

이 특허를 인용한 특허 (10)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Automated action-selection system and method, and application thereof to training prediction machines and driving the development of self-developing devices 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (3)

이 특허를 인용한 특허 (10)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트