IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
UP-0658683
(2005-07-26)
|
등록번호 |
US-7672913
(2010-04-21)
|
우선권정보 |
EP-04291912(2004-07-27) |
국제출원번호 |
PCT/EP2005/008724
(2005-07-26)
|
§371/§102 date |
20070129
(20070129)
|
국제공개번호 |
WO06/010645
(2006-02-02)
|
발명자
/ 주소 |
- Kaplan, Frederic
- Oudeyer, Pierre-Yves
|
출원인 / 주소 |
|
대리인 / 주소 |
Oblon, Spivak, McClelland, Maier & Neustadt, L.L.P.
|
인용정보 |
피인용 횟수 :
10 인용 특허 :
3 |
초록
▼
In order to promote efficient learning of relationships inherent in a system or setup S described by system-state and context parameters, the next action to take, affecting the setup, is determined based on the knowledge gain expected to result from this action. Knowledge-gain is assessed “lo
In order to promote efficient learning of relationships inherent in a system or setup S described by system-state and context parameters, the next action to take, affecting the setup, is determined based on the knowledge gain expected to result from this action. Knowledge-gain is assessed “locally” by comparing the value of a knowledge-indicator parameter after the action with the value of this indicator on one or more previous occasions when the system-state/context parameter(s) and action variable(s) had similar values to the current ones. Preferably the “level of knowledge” is assessed based on the accuracy of predictions made by a prediction module. This technique can be applied to train a prediction machine by causing it to participate in the selection of a sequence of actions. This technique can also be applied for managing development of a self-developing device or system, the self-developing device or system performing a sequence of actions selected according to the action-selection technique.
대표청구항
▼
The invention claimed is: 1. An automated action-selection system adapted to generate signals specifying values for a set of one or more action variables defining an action that can be taken whereby to affect a setup S, the automated action-selection system comprising: input means for receiving sig
The invention claimed is: 1. An automated action-selection system adapted to generate signals specifying values for a set of one or more action variables defining an action that can be taken whereby to affect a setup S, the automated action-selection system comprising: input means for receiving signals indicative of the value, at a time t, of a set of zero or more system-state/context parameters (SC(t)) describing the state and/or context of the setup S; a region definer adapted to define a set of regions in a multi-dimensional system-state/context/action space, each dimension of the system-state/context/action space being defined by a respective different parameter or variable of the sets of system-state/context parameters and action variables; means for determining a set of candidate actions, each candidate action consisting of a possible set of values for the action variables; a region identifier for identifying the region in system-state/context/action space containing the combination of a given candidate action with values of any system-state/context parameters at time t; a prediction unit adapted to predict the value of a set of one or more predicted variables (VAR) a predetermined interval after time t, wherein a prediction function applied by the prediction unit depends upon the region in system-state/context/action space containing the combination of this given candidate action with any system-state/context parameters at time t; calculator means adapted to calculate, for selected candidate actions, a respective indicator of the actual error in the prediction made by the prediction unit for said selected candidate action; memory means for storing indicators of actual prediction errors made by the prediction unit for respective candidate actions selected on one or more previous occasions; assessment means adapted to evaluate the expected improvement in the performance of the prediction unit if a given candidate action is performed, wherein an assessment performed by the assessment means depends upon the region R in system-state/context/action space containing the combination of this given candidate action with the values, at time t, of any system-state/context parameters, and the assessment means is further adapted to evaluate said expected improvement by comparing an indicator of the actual prediction error that existed on one or more occasions, previous to time t, when the setup S had a combination of system-state/context parameters and action variables located in the same region R of the system-state/context/action; and means for generating a signal indicating the desirability of selecting a given candidate action for performance, said signal being dependent on the expected improvement in the performance of the prediction unit evaluated by the assessment unit for said given candidate action. 2. The automated action-selection system according to claim 1, and comprising an action selector adapted to select an action for performance to affect the setup S, the action selector having a probability of p, where 0<p<1, of selecting that one of the set of candidate actions that the assessment unit evaluates to be expected to yield the greatest improvement in performance of the prediction unit; wherein the action-selection system outputs data defining the action selected by the action selector. 3. The automated action-selection system according to claim 2, wherein the action selector has a probability of 1-p of selecting a random action for performance. 4. The automated action-selection system according to claim 1, wherein the region definer is adapted to define the regions in system- state/context/action space dynamically, wherein the region divider divides an existing region R into two or more new regions when a first criterion (C1) is met. 5. The automated action-selection system according to claim 4, and comprising counting means for counting the number of occasions on which an action is taken affecting the setup S and the combination of the action variable values defining said action with the values of system-state/context parameter values at the time said action is taken, falls within the region R; wherein the first criterion (C1) is met when the counting means has counted up to a predetermined number (NS). 6. The automated action-selection system according to claim 1, and comprising meta prediction means adapted to evaluate the expected prediction error for predictions made by the prediction means; wherein the assessment means is adapted to evaluate the expected improvement in the performance of the prediction means if a given candidate action is performed by evaluating the decrease in prediction error that is expected to result from performance of said given candidate action. 7. The automated action-selection system according to claim 1, and comprising means for receiving feedback regarding the actual values of the predicted variables (VAR) resulting from performance of a given action; the prediction means is responsive to the feedback data whereby to adapt the prediction function that is applied for candidate actions in the same region of system-state/context/action space as the region containing the combination of the action which produced the feedback data and values of any system-state/context parameters at the time when said given action was performed. 8. The automated action-selection system according to claim 7, and comprising memory means for storing training example data; for a given performed action said training example data comprising action variable values, any system-state/context parameter values applicable at the time said given action was performed, and feedback data defining the actual values of the set of predicted variables (VAR). 9. The automated action-selection system according to claim 8, wherein the region definer is adapted dynamically to define regions in system-state/context/action space and, when dividing a region R of system-state/context/action space into two or more new regions defines boundaries of the new regions so that there is a balance between the numbers of training examples in each new region, and the variance of the training examples is minimized in system-state/context/action space, or in a space defined by the set of predicted variables, or in a multi-dimensional space combining system-state/context/action space and said space defined by the set of predicted variables. 10. The automated action-selection system according to claim 1, wherein the prediction unit is adapted to make predictions by applying nearest-neighbours algorithms. 11. An automated prediction-machine-training system comprising an automated action-selection system according to claim 1, wherein the combination of the region identifier and the prediction unit of the automated action-selection system constitute a trainable prediction machine; the automated prediction-machine-training system comprising means for receiving feedback regarding the actual values of the predicted variables (VAR) resulting from performance of a given action; and wherein the prediction means is responsive to the feedback data whereby to adapt the prediction function that is applied for candidate actions in the same region of system- state/context/action space as the region containing the combination of the action which produced the feedback data and any system-state/context parameters applicable at the time when said given action was performed. 12. The automated prediction-machine-training system according to claim 11, and comprising: monitoring means for monitoring the evolution of the respective prediction functions applied by the prediction unit for the different regions of system-state/context/action space; and an operation-mode setter adapted to change over operation of the prediction-machine-training system from a training mode to a prediction mode when the monitoring means determines that the rate of change of the prediction functions has fallen below a threshold level. 13. A prediction machine trained using the automated prediction-machine-training system of claim 11. 14. A computer system adapted to predict the value of a set of one or more predicted variables (VAR), said computer system having been trained using the automated prediction-machine-training system of claim 11. 15. An expert system adapted to predict the value of a set of one or more predicted variables (VAR), said computer system having been trained using the automated prediction-machine-training system of claim 11. 16. An automated action-selection system for a robot or other self-developing device or system, the action-selection system being according to any claim 1, wherein the robot or other self-developing device or system is arranged to perform actions affecting the setup S. 17. An automated action-selection system for a self-developing robot or other self-developing device or system, according to claim 16, and comprising means for supplying the signal indicating the desirability of selecting a given candidate action for performance to a decision-making unit which selects the actions to be performed by the self-developing device or system. 18. An automated action-selection system for a self-developing robot or other self-developing device or system, according to claim 16, wherein the action-selection system is separate from, but in communication with the self-developing device or system. 19. A self-developing robot, or other self-developing device or system, comprising an automated action-selection system according to claim 16. 20. A self-developing robot, or other self-developing device or system, trained by having been caused to participate in the selection and performance of a series of actions, actions in the series having been selected using the automated action-selection system of any claim 16. 21. An automated action-selection method making use of an automated action-selection system according to claim 1, the automated action-selection method comprising the steps of: providing the automated action-selection system; and inputting to the action-selection system signals indicative of the value, at a time t, of a set of zero or more system-state/context parameters (SC(t)) describing the state and/or context of the setup S. 22. The automated action-selection method according to claim 21, and comprising the step of feeding back to the action-selection system data indicative of the actual values of the predicted variables (VAR) resulting from performance of a given action, and data indicative of the applicable values of any system-state/context parameters at the time said given action was performed, wherein the prediction means is responsive to the feedback data whereby to adapt the prediction function that is applied for candidate actions in the same region of system-state/context/action space as the region containing the combination of the action which produced the feedback data and any system-state/context parameters applicable at the time when said given action was performed. 23. An automated prediction-machine-training method comprising the automated action-selection method according to claim 22, wherein the combination of the region identifier and the prediction unit of the action-selection system constitute a trainable prediction machine. 24. An automated action-selection method for a robot or other self-developing device or system, the automated action-selection method being according to claim 21, wherein the robot or other self-developing device or system is arranged to perform actions affecting the setup S. 25. An automated action-selection method for a robot or other self-developing device or system, according to claim 24, and comprising the step of supplying the signal indicating the desirability of selecting a given candidate action for performance to a decision-making unit which selects the actions to be performed by the self-developing device or system.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.