Apparatus and methods for control of robot actions based on corrective user inputs
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G05B-019/04
B25J-009/16
G06N-099/00
G06N-003/00
G05D-001/00
출원번호
US-0174858
(2016-06-06)
등록번호
US-9789605
(2017-10-17)
발명자
/ 주소
Meier, Philip
Passot, Jean-Baptiste
Ibarz Gabardos, Borja
Laurent, Patryk
Sinyavskiy, Oleg
O'Connor, Peter
Izhikevich, Eugene
출원인 / 주소
BRAIN CORPORATION
대리인 / 주소
Gazdzinski & Associates, PC
인용정보
피인용 횟수 :
4인용 특허 :
101
초록▼
Robots have the capacity to perform a broad range of useful tasks, such as factory automation, cleaning, delivery, assistive care, environmental monitoring and entertainment. Enabling a robot to perform a new task in a new environment typically requires a large amount of new software to be written,
Robots have the capacity to perform a broad range of useful tasks, such as factory automation, cleaning, delivery, assistive care, environmental monitoring and entertainment. Enabling a robot to perform a new task in a new environment typically requires a large amount of new software to be written, often by a team of experts. It would be valuable if future technology could empower people, who may have limited or no understanding of software coding, to train robots to perform custom tasks. Some implementations of the present invention provide methods and systems that respond to users' corrective commands to generate and refine a policy for determining appropriate actions based on sensor-data input. Upon completion of learning, the system can generate control commands by deriving them from the sensory data. Using the learned control policy, the robot can behave autonomously.
대표청구항▼
1. A method for performing robot actions by a robot, the method comprising: defining a policy comprising a plurality of parameters for determining robot actions based at least in part on sensory-data inputs, the defining of the policy comprising mapping the sensory-data inputs to robot actions;recei
1. A method for performing robot actions by a robot, the method comprising: defining a policy comprising a plurality of parameters for determining robot actions based at least in part on sensory-data inputs, the defining of the policy comprising mapping the sensory-data inputs to robot actions;receiving a first sensory-data input from a sensor;performing a first robot action at a first action time, wherein the first robot action is determined based at least in part on the first sensory-data input and application of the policy;determining that a user input was received at an input time corresponding to the first action time, wherein a corrective command at least partially derived from the user input specifies a corrective robot action for physical performance, the user input being indicative of at least partial dissatisfaction with the first robot action; andmodifying the policy based on the corrective command and the first sensory-data input. 2. The method of claim 1, further comprising determining a second robot action at a second action time, wherein the second robot action is based at least in part on the modified policy and a second sensory-data input from the sensor. 3. The method of claim 1, wherein the modifying of the policy further comprises using a learning model. 4. The method of claim 1, wherein the at least partial dissatisfaction includes a discrepancy between a target robot action and the first robot action. 5. The method of claim 1, wherein the modifying of the policy comprises changing parameters relating sensory-data inputs to actuator responses that correspond to robot actions. 6. The method of claim 3, wherein the learning model includes updating parameters based on a gradient of error determined at least in part by a difference between the first robot action and a second robot action specified by a combination of the corrective command and the policy. 7. The method of claim 1, further comprising determining a first context-variable value for a context variable, wherein the first context-variable value is determined from the first sensory-data input and the policy is further determined based at least in part on the context variable. 8. A robot, comprising: an actuator configured to perform robot actions for robotic tasks;a sensor configured to detect an environmental context of the robot and generate sensory-data inputs; anda processor apparatus configured to: define a policy comprising a plurality of parameters configured to determine robot actions based at least in part on sensory-data inputs;determine that a user input was received at an input time corresponding to a performance of a first robot action corresponding to a detection of a first sensory-data input;generate a corrective command at least partially derived from the user input, the user input being indicative of at least partial dissatisfaction with the first robot action, andmodify the policy based on the corrective command and the first sensory-data input. 9. The robot of claim 8, further comprising a user interface configured to receive the user input. 10. The robot of claim 8, wherein the at least partial dissatisfaction includes a discrepancy between a target robot action and the first robot action. 11. The robot of claim 8, wherein the modification of the policy further comprises usage of a learning model. 12. The robot of claim 8, wherein the processor apparatus is further configured to determine a first context-variable value for a context variable, wherein the first context-variable value is determined from the first sensory-data input and the policy is further determined based at least in part on the context variable. 13. The robot of claim 8, wherein the sensor is at least one of a light sensor, a motion detector, an inertial measurement unit, and a global positioning system receiver. 14. A non-transitory computer-readable storage medium having a plurality of instructions stored thereon, the instructions being executable by a processing apparatus to operate a robot, the instructions configured to, when executed by the processing apparatus, cause the processing apparatus to: define a policy comprising a plurality of parameters configured to determine robot actions based at least in part on sensory-data inputs, wherein the policy maps the sensory-data inputs to robot actions;receive a first sensory-data input;perform a first robot action at a first action time, wherein the first action is determined based at least in part on the first sensory-data input and application of the policy;determine that a user input was received at an input time corresponding to the first action time, wherein a corrective command at least partially derived from the user input specifies a corrective robot action for physical performance, the user input being indicative of at least partial dissatisfaction with the first robot action; andmodify the policy based on the corrective command and the first sensory-data input. 15. The non-transitory computer-readable storage medium of claim 14, wherein the instructions are further configured to, when executed by the processing apparatus, determine a second robot action at a second action time, wherein the second robot action is based at least in part on the modified policy and a second sensory-data input. 16. The non-transitory computer-readable storage medium of claim 14, wherein the modification of the policy further comprises usage of a learning model. 17. The non-transitory computer-readable storage medium of claim 14, wherein the instructions are further configured to, when executed by the processing apparatus, assess whether the modified policy comprises an improvement over the policy prior to modification, the improvement being determined by a threshold being exceeded. 18. The non-transitory computer-readable storage medium of claim 14, wherein the modification of the policy comprises changing parameters relating sensory-data inputs to actuator responses that correspond to robot actions. 19. The non-transitory computer-readable storage medium of claim 16, wherein the learning model includes updating parameters based on a gradient of error determined at least in part by a difference between the first robot action and a second robot action specified by a combination of the corrective command and the policy. 20. The non-transitory computer-readable storage medium of claim 14, wherein the instructions are further configured to, when executed by the processing apparatus, determine a first context-variable value for a context variable, wherein the first context-variable value is determined from the first sensory-data input and the policy is further determined based at least in part on the context variable. 21. The non-transitory computer-readable storage medium of claim 14, wherein the at least partial dissatisfaction includes a discrepancy between a target robot action and the first robot action.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (101)
Werbos Paul J., 3-brain architecture for an intelligent decision and control system.
Ito, Masato; Minamino, Katsuki; Yoshiike, Yukiko; Suzuki, Hirotaka; Kawamoto, Kenta, Apparatus and method for embedding recurrent neural networks into the nodes of a self-organizing map.
DeYong Mark R. (Las Cruces NM) Findley Randall L. (Austin TX) Eskridge Thomas C. (Las Cruces NM) Fields Christopher A. (Rockville MD), Asynchronous temporal neural processing element.
Kerr Randal H. (Richford NY) Mesnard Robert M. (Endicott NY), Automatic generation of executable computer code which commands another program to perform a task and operator modificat.
Frank D. Francone ; Peter Nordin SE; Wolfgang Banzhaf DE, Computer implemented machine learning method and system including specifically defined introns.
Spoerre Julie K. (Tallahassee FL) Lin Chang-Ching (Tallahassee FL) Wang Hsu-Pin (Tallahassee FL), Machine performance monitoring and fault classification using an exponentially weighted moving average scheme.
Grossberg Stephen (Newton Highlands MA) Kuperstein Michael (Brookline MA), Massively parellel real-time network architectures for robots capable of self-calibrating their operating parameters thr.
Abdallah, Muhammad E; Platt, Robert; Wampler, II, Charles W.; Reiland, Matthew J; Sanders, Adam M, Method and apparatus for automatic control of a humanoid robot.
Sakaue Shiyuki (Yokohama JPX) Sugimoto Koichi (Hiratsuka JPX) Arai Shinichi (Yokohama JPX), Method and apparatus for controlling a robot hand along a predetermined path.
Peltola Tero (Helsinki FIX) Matakselka Jorma (Vantaa FIX) Harju Esa (Espoo FIX) Salovuori Heikki (Helsinki FIX) Keskinen Jukka (Vantaa FIX) Makinen Kari (Helsinki FIX) Roikonen Olli (Espoo FIX), Method for congestion management in a frame relay network and a node in a frame relay network.
Wilson Charles L. (Darnestown MD) Garris Michael D. (Gaithersburg MD) Wilkinson ; Jr. Robert A. (Hyattstown MD), Object/anti-object neural network segmentation.
Yokono, Jun; Sabe, Kohtaro; Costa, Gabriel; Ohashi, Takeshi, Operational control method, program, and recording media for robot device, and robot device.
Eguchi, Toru; Yamada, Akihiro; Kusumi, Naohiro; Sekiai, Takaaki; Fukai, Masayuki; Shimizu, Satoru, Plant control system and thermal power generation plant control system.
Coenen, Olivier, Proportional-integral-derivative controller effecting expansion kernels comprising a plurality of spiking neurons associated with a plurality of receptive fields.
Hickman, Ryan; Kuffner, Jr., James J.; Bruce, James R.; Gharpure, Chaitanya; Kohler, Damon; Poursohi, Arshan; Francis, Jr., Anthony G.; Lewis, Thor, Shared robot knowledge base for use with cloud computing system.
Shaffer Gary K. (Butler PA) Whittaker William L. (Pittsburgh PA) West Jay H. (Pittsburgh PA) Clow Richard G. (Phoenix AZ) Singh Sanjiv J. (Pittsburgh PA) Lay Norman K. (Peoria IL) Devier Lonnie J. (P, System and method for detecting obstacles in the path of a vehicle.
Blumberg, Bruce; Brooks, Rodney; Buehler, Christopher J.; Deegan, Patrick A.; DiCicco, Matthew; Dye, Noelle; Ens, Gerry; Linder, Natan; Siracusa, Michael; Sussman, Michael; Williamson, Matthew M., Training and operating industrial robots.
Mochizuki, Yoshiyuki; Naka, Toshiya; Asahara, Shigeo, Virtual space control data receiving apparatus,virtual space control data transmission and reception system, virtual space control data receiving method, and virtual space control data receiving prog.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.