[특허]ARCHITECTURE FOR DEEP Q LEARNING

ARCHITECTURE FOR DEEP Q LEARNING 원문보기

IPC분류정보
국가/구분	United States(US) Patent 공개
국제특허분류(IPC7판)	G06N-003/08 G06N-003/04
출원번호	16176903 (2018-10-31)
공개번호	20200134445 (2020-04-30)
발명자 / 주소	Che, Shuai Yin, Jieming
출원인 / 주소	Advanced Micro Devices, Inc.
인용정보	피인용 횟수 : 0 인용 특허 : 0

초록 ▼

The deep Q learning technique trains weights of an artificial neural network using a number of unique features, including separate target and prediction networks, random experience replay to avoid issues with temporally correlated training samples, and others. A hardware architecture is described that is tuned to perform deep Q learning. Inference cores use a prediction network to determine an action to apply to an environment. A replay memory stores the results of the action. Training cores use a loss function derived from outputs from both the target and prediction networks to update weights of the prediction neural networks. A high speed copy engine periodically copies weights from the prediction neural network to the target neural network.

대표청구항 ▼

1. A method for training a prediction artificial neural network, the method comprising: applying, by one or more inference cores, state information for time step t to a prediction artificial neural network having weights stored in a prediction network weight memory, to obtain output scores for a set of actions;selecting an action from the set of actions based on the output scores, for application to an environment, to advance the environment to time step t+1;storing a tuple for a transition from state st to state st+1 into a replay memory, the tuple including the selected action, and a reward provided by the environment;adjusting, by the one or more training cores, weights of the prediction artificial neural network stored in the prediction network weight memory based on application of states st and st+1 from the tuple to the prediction artificial neural network and a target artificial neural network having weights stored in a target network weight memory, respectively. 2. The method of claim 1, wherein adjusting the weights of the prediction artificial neural network includes: sampling, by one or more training cores, one or more tuples from the replay memory, where each tuple includes a state sj, an action aj, a reward for the action rj, and a subsequent state sj+1. 3. The method of claim 2, wherein adjusting the weights of the prediction artificial neural network further includes: applying, by the one or more training cores, state sj+1 to a target artificial neural network having weights stored in a target network weight memory and obtaining a highest action score output from the target artificial neural network. 4. The method of claim 3, wherein adjusting the weights of the prediction artificial neural network further includes: applying, by the one or more training cores, state sj to the prediction artificial neural network to obtain an action score for action aj. 5. The method of claim 4, wherein adjusting the weights of the prediction artificial neural network further includes: determining, by the one or more training cores, a loss function based on the highest action score output by the target artificial neural network for state sj+1, the action score for action aj output by the prediction artificial neural network, and the reward score rj. 6. The method of claim 5, wherein adjusting the weights of the prediction artificial neural network further includes: performing, by the one or more training cores, a gradient descent operation on the loss function with respect to the weights of the prediction artificial neural network. 7. The method of claim 1, further comprising: periodically updating the weights of the target artificial neural network via a copy engine by copying the weights of the prediction artificial neural network into the target artificial neural network memory. 8. The method of claim 1, further comprising: repeating the applying, selecting, storing, and adjusting steps for each step of an episode of training. 9. The method of claim 8, further comprising: performing multiple episodes of training to train the prediction artificial neural network. 10. A machine learning device for training a prediction artificial neural network, the machine learning device comprising: a set of memories including a replay memory, a prediction network weight memory, and a target network weight memory;one or more inference cores configured to apply state information for time step t to a prediction artificial neural network having weights stored in the prediction network weight memory, to obtain output scores for a set of actions;an action selection processor, comprising one of the one or more inference cores or a processor other than the one or more inference cores, configured to select an action from the set of actions based on the output scores, for application to an environment, to advance the environment to time step t+1;a tuple storing processor, comprising one of the one or more inference cores or a processor other than the one or more inference cores, configured to store a tuple for a transition from state st to state st+1 into the replay memory, the tuple including the selected action, and a reward provided by the environment; andone or more training cores configured to adjust weights of the prediction artificial neural network stored in the prediction network weight memory based on application of states st and st+1 from the tuple to the prediction artificial neural network and a target artificial neural network having weights stored in the target network weight memory, respectively. 11. The machine learning device of claim 10, wherein adjusting the weights of the prediction artificial neural network includes: sampling, by one or more training cores, one or more tuples from the replay memory, where each tuple includes a state sj, an action aj, a reward for the action rj, and a subsequent state sj+1. 12. The machine learning device of claim 11, wherein adjusting the weights of the prediction artificial neural network further includes: applying, by the one or more training cores, state sj+1 to a target artificial neural network having weights stored in a target network weight memory and obtaining a highest action score output from the target artificial neural network. 13. The machine learning device of claim 12, wherein adjusting the weights of the prediction artificial neural network further includes: applying, by the one or more training cores, state sj to the prediction artificial neural network to obtain an action score for action aj. 14. The machine learning device of claim 13, wherein adjusting the weights of the prediction artificial neural network further includes: determining, by the one or more training cores, a loss function based on the highest action score output by the target artificial neural network for state sj+1, the action score for action aj output by the prediction artificial neural network, and the reward score rj. 15. The machine learning device of claim 14, wherein adjusting the weights of the prediction artificial neural network further includes: performing, by the one or more training cores, a gradient descent operation on the loss function with respect to the weights of the prediction artificial neural network. 16. The machine learning device of claim 10, further comprising: a copy engine configured to periodically update the weights of the target artificial neural network by copying the weights of the prediction artificial neural network into the target artificial neural network memory. 17. The machine learning device of claim 10, wherein the one or more inference cores, the action selection processor, the tuple storing processor, and the one or more training cores are further configured to: repeat the applying, selecting, storing, and adjusting for each step of an episode of training. 18. The machine learning device of claim 17, wherein the one or more inference cores, the action selection processor, the tuple storing processor, and the one or more training cores are further configured to: performing multiple episodes of training to train the prediction artificial neural network. 19. A computing device for training a prediction artificial neural network, the computing device comprising: a central processor configured to interface with an environment by applying actions to the environment and observing states and rewards output by the environment; anda machine learning device for training the prediction artificial neural network, the machine learning device comprising: a set of memories including a replay memory, a prediction network weight memory, and a target network weight memory;one or more inference cores configured to apply state information for time step t to a prediction artificial neural network having weights stored in the prediction network weight memory, to obtain output scores for a set of actions;an action selection processor, comprising one of the one or more inference cores, configured to select an action from the set of actions based on the output scores, for application to an environment, to advance the environment to time step t+1;a tuple storing processor, comprising one of the one or more inference cores, configured to store a tuple for a transition from state st to state st+1 into the replay memory, the tuple including the selected action, and a reward provided by the environment; andone or more training cores configured to adjust weights of the prediction artificial neural network stored in the prediction network weight memory based on application of states st and st+1 from the tuple to the prediction artificial neural network and a target artificial neural network having weights stored in the target network weight memory, respectively. 20. The computing device of claim 19, wherein adjusting the weights of the prediction artificial neural network includes: sampling, by one or more training cores, one or more tuples from the replay memory, where each tuple includes a state sj, an action aj, a reward for the action rj and a subsequent state sj+1.

내보내기 메뉴

내보내기 구분

파일저장
인쇄
메일전송

구성항목

기본정보
상세정보

관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC

저장형식

Text(ASCII format)
Excel format
PIAS분석(.xls)

메일정보

받는사람 (필수): @
보내는사람 (선택): @
제목
내용: KISTI 검색결과 이메일 서비스

안내

총 건의 자료가 검색되었습니다.

다운받으실 자료의 인덱스를 입력하세요. (1-10,000)

검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다.

데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요)

다운로드 파일은 UTF-8 형태로 저장됩니다.
파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오.

Text(ASCII format)
Excel format

AI-Helper ※ AI-Helper는 을 사용합니다.

AI-Helper

안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

연합인증

ARCHITECTURE FOR DEEP Q LEARNING 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

이 특허와 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

ARCHITECTURE FOR DEEP Q LEARNING 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

이 특허와 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트