Automatic signature generation for malicious PDF files
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-021/60
G06F-021/56
H04L-029/06
출원번호
US-0115036
(2011-05-24)
등록번호
US-8695096
(2014-04-08)
발명자
/ 주소
Zhang, Liang
출원인 / 주소
Palo Alto Networks, Inc.
대리인 / 주소
Van Pelt, Yi & James LLP
인용정보
피인용 횟수 :
94인용 특허 :
5
초록▼
In some embodiments, automatic signature generation for malicious PDF files includes: parsing a PDF file to extract script stream data embedded in the PDF file; determining whether the extracted script stream data within the PDF file is malicious; and automatically generating a signature for the PDF
In some embodiments, automatic signature generation for malicious PDF files includes: parsing a PDF file to extract script stream data embedded in the PDF file; determining whether the extracted script stream data within the PDF file is malicious; and automatically generating a signature for the PDF file.
대표청구항▼
1. A system, comprising: a processor configured to: parse a PDF file to extract script stream data embedded in the PDF file, wherein the PDF file is known to include malicious content; anddetermine whether to generate a signature associated with the PDF file based at least in part on at least a port
1. A system, comprising: a processor configured to: parse a PDF file to extract script stream data embedded in the PDF file, wherein the PDF file is known to include malicious content; anddetermine whether to generate a signature associated with the PDF file based at least in part on at least a portion of the extracted script stream data:in the event that the signature associated with the PDF file is determined to be based at least in part on the at least portion of the extracted script stream data, automatically generate the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data, wherein the signature is configured to be matched against a potentially malicious PDF file; andin the event that the signature associated with the PDF file is determined not to be based at least in part on the at least portion of the extracted script stream data, automatically generate the signature associated with the PDF file from an identified cross-reference table from a plurality of cross-reference tables within the PDF file, wherein the identified cross-reference table is identified from the plurality of cross-reference tables based at least in part on a position of the identified cross-reference table relative to respective positions associated with one or more cross-reference tables other than the identified cross-reference table from the plurality of cross-reference tables; anda memory coupled to the processor and configured to provide the processor with instructions. 2. The system of claim 1, wherein the processor is further configured to determine which objects, if any, within the PDF file includes JavaScript data. 3. The system of claim 1, wherein the processor is further configured to traverse through one or more objects within the PDF file to find an object associated with JavaScript data. 4. The system of claim 1, wherein determining whether to generate the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data includes: determining one or more portions of the extracted script stream data that are potentially malicious;assigning one or more numeric values corresponding to the one or more portions of the extracted script stream data that are potentially malicious, wherein the one or more numeric values are determined based on heuristics;aggregating the one or more numeric values into an aggregate numeric value; anddetermining whether the aggregate numeric value exceeds a threshold numeric value: in the event that the aggregate numeric value exceeds the threshold numeric value, determining to generate the signature based at least in part on the at least portion of the extracted script stream data; andin the event that the aggregate numeric value is equal to or less than the threshold numeric value, determining not to generate the signature based at least in part on the at least portion of the extracted script stream data. 5. The system of claim 4, wherein the one or more portions of the extracted script stream data that are potentially malicious include one or more of the following: an iFrame that includes an associated Uniform Resource Locator (URL) associated with a blacklisted domain and an iFrame that includes an associated URL associated with a webpage configured to download an .exe file, a .dll file, and/or a .doc file. 6. The system of claim 1, wherein the processor is further configured to de-obfuscate the PDF file. 7. The system of claim 1, wherein the processor is configured to automatically generate the signature for the PDF file based at least in part on at least a subset of portion(s) of a script within the PDF file that was determined to be malicious. 8. The system of claim 1, wherein the processor is configured to automatically generate the signature for the PDF file based at least in part on selecting a plurality of patterns exceeding a suspicious threshold to automatically generate the signature using the plurality of patterns. 9. The system of claim 1, wherein the processor is configured to automatically generate the signature for the PDF file based at least in part on selecting a plurality of patterns exceeding a suspicious threshold to automatically generate the signature using the plurality of patterns, wherein each of the plurality of patterns is based on different threshold numeric values. 10. A method, comprising: parsing a PDF file to extract script stream data embedded in the PDF file, wherein the PDF file is known to include malicious content; anddetermining whether to generate a signature associated with the PDF file based at least in part on at least a portion of the extracted script stream data:in the event that the signature associated with the PDF file is determined to be based at least in part on the at least portion of the extracted script stream data, automatically generating the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data, wherein the signature is configured to be matched against a potentially malicious PDF; andin the event that the signature associated with the PDF file is determined not to be based at least in part on the at least portion of the extracted script stream data, automatically generating the signature associated with the PDF file from an identified cross-reference table from a plurality of cross-reference tables within the PDF file, wherein the identified cross-reference table is identified from the plurality of cross-reference tables based at least in part on a position of the identified cross-reference table relative to respective positions associated with one or more cross-reference tables other than the identified cross-reference table from the plurality of cross-reference tables. 11. The method of claim 10, wherein parsing the PDF file to extract script stream data includes determining which objects, if any, within the PDF file include JavaScript data. 12. The method of claim 10, wherein parsing the PDF file to extract script stream data includes traversing through one or more objects within the PDF file to find an object associated with JavaScript data. 13. The method of claim 10, wherein determining whether to generate the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data includes: detecting one or more portions of the extracted script stream data that are potentially malicious;assigning one or more numeric values corresponding to the one or more portions of the extracted script stream data that are potentially malicious, wherein the one or more numeric values are determined based on heuristics;aggregating the one or more numeric values into an aggregate numeric value; anddetermining whether the aggregate numeric value exceeds a threshold numeric value: in the event that the aggregate numeric value exceeds the threshold numeric value, determining to generate the signature based at least in part on the at least portion of the extracted script stream data; andin the event that the aggregate numeric value is equal to or less than the threshold numeric value, determining not to generate the signature based at least in part on the at least portion of the extracted script stream data. 14. The method of claim 13, wherein the one or more portions of the extracted script stream data that are potentially malicious include one or more of the following: an iFrame that includes an associated Uniform Resource Locator (URL) associated with a blacklisted domain and an iFrame that includes an associated URL associated with a webpage configured to download an .exe file, a .dll file, and/or a .doc file. 15. The method of claim 10, further comprising de-obfuscating the PDF file. 16. The method of claim 10, wherein automatically generating the signature for the PDF file is based at least in part on at least a subset of portion(s) of a script within the PDF file. 17. The method of claim 10, wherein automatically generating the signature for the PDF file includes selecting a plurality of patterns exceeding a suspicious threshold to automatically generate the signature using the plurality of patterns. 18. The method of claim 10, wherein automatically generating the signature for the PDF file includes selecting a plurality of patterns exceeding a suspicious threshold to automatically generate the signature using the plurality of patterns, wherein each of the plurality of patterns is based on different threshold numeric values. 19. A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for: parsing a PDF file to extract script stream data embedded in the PDF file, wherein the PDF file is known to include malicious content; anddetermining whether to generate a signature associated with the PDF file based at least in part on at least a portion of the extracted script stream data:in the event that the signature associated with the PDF file is determined to be based at least in part on the at least portion of the extracted script stream data, automatically generating the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data, wherein the signature is configured to be matched against a potentially malicious PDF; andin the event that the signature associated with the PDF file is determined not to be based at least in part on the at least portion of the extracted script stream data, automatically generating the signature associated with the PDF file from an identified cross-reference table from a plurality of cross-reference tables within the PDF file, wherein the identified cross-reference table is identified from the plurality of cross-reference tables based at least in part on a position of the identified cross-reference table relative to respective positions associated with one or more cross-reference tables other than the identified cross-reference table from the plurality of cross-reference tables. 20. A system, comprising: a processor configured to: determine that a PDF file does not include script stream data, wherein the PDF file is known to include malicious content;determine an identified cross-reference table from a plurality of cross-reference tables within the PDF file, wherein the identified cross-reference table is identified from the plurality of cross-reference tables based at least in part on a position of the identified cross-reference table relative to respective positions associated with one or more cross-reference tables other than the identified cross-reference table from the plurality of cross-reference tables; andautomatically generate a signature for the PDF file from the identified cross-reference table; anda memory coupled to the processor and configured to provide the processor with instructions. 21. The system of claim 20, wherein the processor is further configured to de-obfuscate the PDF file. 22. The system of claim 20, wherein the identified cross-reference table is associated with a most recent incremental save associated with the PDF file. 23. The system of claim 20, wherein the processor is further configured to decrypt the identified cross-reference table. 24. The system of claim 20, wherein the processor is further configured to determine a startxref object and two continuous reference objects associated with a predetermined offset range within the identified cross-reference table. 25. The system of claim 20, wherein the processor is further configured to determine a startxref object and two continuous in use reference objects associated with a predetermined offset range within the identified cross-reference table. 26. The system of claim 20, wherein the processor is further configured to determine a startxref object and two continuous reference objects associated with a predetermined offset range within the identified cross-reference table and to automatically generate the signature associated with the PDF file based at least in part on the startxref object and the two continuous reference objects associated with the predetermined offset range within the identified cross-reference table.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (5)
Novitchi, Mihai, Anti-malware emulation systems and methods.
Singh, Abhishek; Mesdaq, Ali; Das, Anirban; Jain, Varun, Framework for classifying an object as malicious with machine learning for deploying updated predictive models.
Ismael, Osman Abdoul; Song, Dawn; Xue, Hui, Framework for efficient security coverage of mobile software applications using symbolic execution to reach regions of interest within an application.
Khalid, Yasir; Amin, Muhammad; Jing, Emily; Rizwan, Muhammad, Malicious content analysis with multi-version application support within single operating environment.
Khalid, Yasir; Amin, Muhammad; Jing, Emily; Rizwan, Muhammad, Malicious content analysis with multi-version application support within single operating environment.
Thioux, Emmanuel; Amin, Muhammad; Ismael, Osman Abdoul, System and method for analysis of a memory dump associated with a potentially malicious content suspect.
Paithane, Sushant; Vashist, Sai; Yang, Raymond; Khalid, Yasir, System and method for detecting file altering behaviors pertaining to a malicious attack.
Rivlin, Alexandr; Mehra, Divyesh; Uyeno, Henry; Pidathala, Vinay, System and method for determining a threat based on correlation of indicators of compromise from other sources.
Kumar, Vineet; Otvagin, Alexander; Borodulin, Nikita, System and method for triggering analysis of an object for malware in response to modification of that object.
Rivlin, Alexandr; Mehra, Divyesh; Uyeno, Henry; Pidathala, Vinay, System and method of detecting delivery of malware based on indicators of compromise from different sources.
Aziz, Ashar; Amin, Muhammad; Ismael, Osman Abdoul; Bu, Zheng, System, apparatus and method for automatically verifying exploits within suspect objects and highlighting the display information associated with the verified exploits.
Khalid, Yasir; Deshpande, Shivani; Amin, Muhammad, System, apparatus and method for detecting a malicious attack based on static analysis of a multi-flow object.
Ismael, Osman Abdoul, System, apparatus and method for using malware analysis results to drive adaptive instrumentation of virtual machines to improve exploit detection.
Ismael, Osman Abdoul, System, apparatus and method for using malware analysis results to drive adaptive instrumentation of virtual machines to improve exploit detection.
Karandikar, Shrikrishna; Amin, Muhammad; Deshpande, Shivani; Khalid, Yasir, System, device and method for detecting a malicious attack based on communcations between remotely hosted virtual machines and malicious web servers.
Karandikar, Shrikrishna; Amin, Muhammad; Deshpande, Shivani; Khalid, Yasir, System, device and method for detecting a malicious attack based on direct communications between remotely hosted virtual machines and malicious web servers.
Wang, Pengchao, Uploading signatures to gateway level unified threat management devices after endpoint level behavior based detection of zero day threats.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.