A method for the computer-assisted exploration of states of a technical system is provided. The states of the technical system are run by carrying out an action in a respective state of the technical system, the action leading to a new state. A safety function and a feedback rule are used to ensure
A method for the computer-assisted exploration of states of a technical system is provided. The states of the technical system are run by carrying out an action in a respective state of the technical system, the action leading to a new state. A safety function and a feedback rule are used to ensure that a large volume of data of states and actions is run during exploration and that at the same time no inadmissible actions occur which could lead directly or indirectly to the technical system being damaged or to a defective operating state. The method allows a large number of states and actions relating to the technical system to be collected and may be used for any technical system, especially the exploration of states in a gas turbine. The method may be used both in the real operation and during simulation of the operation of a technical system.
대표청구항▼
1. A method for computer-assisted exploration of states of a technical system, comprising: running consecutive states of the technical system by an action whereby executing the action in a respective state leads to a new state being executed;using a safety function to determine whether the action is
1. A method for computer-assisted exploration of states of a technical system, comprising: running consecutive states of the technical system by an action whereby executing the action in a respective state leads to a new state being executed;using a safety function to determine whether the action is a permissible or impermissible action in the technical system before an execution of the action leads to an unknown state not previously run, with the action only being executed if the action is permissible; andselecting a subsequent action based on a backup policy when reaching the unknown state in order to return the state to a known state whereby a known state is a previously run state,wherein the action executed is assigned a reward as a function of the state in which the action is executed, andwherein the new state is reached by the action,wherein the impermissible action features the reward which is smaller than a prespecified value,wherein the safety function is learned based on a plurality of rewards of a plurality of actions, andwherein a pair comprising the state and the action executed in the state run are assigned a minimum reward which after the execution of the action and on subsequently running the backup policy occurs for the action,wherein the safety function is determined based on the minimum reward, andwherein the safety function establishes the impermissible action if the minimum reward is smaller than the prespecified value. 2. The method as claimed in claim 1, wherein the impermissible action is characterized such that, on execution of the impermissible action, the technical system has a probability of one or a probability of greater than zero of going into the state which leads to an undesired and/or incorrect operating state of the technical system directly after the execution of the impermissible action or indirectly after the execution of a plurality of further actions. 3. The method as claimed in claim 1, wherein when the state of the technical system is reached in which the action to be executed is classified by the safety function as impermissible, the subsequent action will be selected based on the backup policy. 4. The method as claimed in claim 1, wherein the safety function is determined with a function approximator which approximates the minimum reward based on a local extrapolation by the state currently to be changed with the action. 5. The method as claimed in claim 4, wherein the function approximator carries out a local-linear and/or local-quadratic extrapolation. 6. The method as claimed in claim 1, wherein the backup policy is a policy prespecified for the technical system. 7. The method as claimed in claimed 6, wherein the backup policy is realized by an existing adjuster of the technical system. 8. The method as claimed in claim 1, wherein the backup policy is determined with a reinforcement learning method based on the plurality of rewards of the plurality of actions. 9. The method as claimed in claimed 8, wherein the reinforcement learning method is based on an optimality criterion in accordance with which a minimum of the expected value of all future rewards is maximized. 10. A method for computer-assisted exploration of states of a technical system, comprising: running consecutive states of the technical system by an action whereby executing the action in a respective state leads to a new state being executed;using a safetysafely function to determine whether the action is a permissible or impermissible action in the technical system before an execution of the action leads to an unknown state not previously run, with the action only being executed if the action is permissible; andselecting a subsequent action based on a backup policy when reaching the unknown state in order to return the state to a known state whereby a known state is a previously run state,wherein on running the plurality of states, the plurality of states are assigned to consecutive categories such that if the plurality of states are changed based on the backup policy an unknown, previously not yet run state reached by the action is assigned a category which the state is assigned before the execution of the action or in all other cases an unknown, previously not yet run state reached by the action is assigned the category which follows the category which the state is assigned before the execution of the action. 11. The method as claimed in claim 10, wherein the plurality of states are run according to a plurality of categories such that in one category all possible actions to be executed are first explored and subsequently a transition is made to the next category. 12. The method as claimed in claim 11, wherein the plurality of states of the category are run with a graph-based pathfinder method in which during the running of the plurality of states a graph is constructedwherein a plurality of nodes on the graph correspond to the plurality of run states, andwherein a plurality of edges on the graph correspond to the plurality of executed actions, andwherein for each node the category of the corresponding state is stored, whereby on reaching the state, in which all possible actions have already been explored, a search is made in the graph for a path to the state in a same category, in which actions can still be explored, and on finding such path to the state, this path is taken. 13. The method as claimed in claim 12, wherein in an event of no path to the state in the same category being found in which actions can still be executed, the plurality states of the following category are run. 14. The method as claimed in claim 11, wherein the plurality of states of a category are run with the reinforcement learning method based on a reward function, whereby in accordance with the reward function the action is assigned a reward when it leads to the state in the category just run in which an exploration of the action is possible. 15. The method as claimed in claim 14, wherein in the reinforcement learning method an action selection rule is updated after running a predetermined number of states, whereby the newly added actions and the respective state in which the newly added action is executed as well as the new state reached by the action are taken into account in the updating. 16. The method as claimed in claim 10, wherein in the graph-based pathfinder method and in the reinforcement learning method similar states of the technical system are grouped into a plurality of common clusters.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (5)
Rajamani Ravi ; Chbat Nicolas Wadih ; Ashley Todd Alan, Controller with neural network for estimating gas turbine internal cycle parameters.
Neuneier,Ralf; Mihatsch,Oliver, Method and configuration for determining a sequence of actions for a system which comprises statuses, whereby a status transition ensues between two statuses as a result of an action.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.