Various embodiments describe a website analyzer that can be used for the automatic identification of unauthorized or malicious websites. A website analyzer can include heuristics for automatically identifying a collection of behaviors typical of unauthorized websites. Some embodiments automatically
Various embodiments describe a website analyzer that can be used for the automatic identification of unauthorized or malicious websites. A website analyzer can include heuristics for automatically identifying a collection of behaviors typical of unauthorized websites. Some embodiments automatically scan content hosted across server computers in a virtual environment and proactively identify potentially malicious websites. The embodiments can also be used to automatically scan content on public networks, such as the Internet. In particular embodiments, the website analyzer can include a semantic analysis engine and a link analysis engine. The semantic analysis engine can use the tag-level structure of HTML pages to formulate metrics which define similarity of web page content. The link analysis engine can compare the structure of embedded URIs and scripts to define metrics which quantify the difference of links between an authorized site and a potentially malicious site.
대표청구항▼
1. A method of detecting an unauthorized site in a service center including a plurality of server computers, comprising: generating first quantitative metrics for an authorized site, wherein the first quantitative metrics are generated from at least content on the authorized site;generating second q
1. A method of detecting an unauthorized site in a service center including a plurality of server computers, comprising: generating first quantitative metrics for an authorized site, wherein the first quantitative metrics are generated from at least content on the authorized site;generating second quantitative metrics for a potentially unauthorized site, wherein the second quantitative metrics are generated from at least content on the unauthorized site;comparing, using a computer, the first and second quantitative metrics so as to perform, at least in part, a content-based comparison between the unauthorized site to the authorized site;if a threshold amount of the first quantitative metrics matches the second quantitative metrics, then performing a link analysis by comparing one or more links on the authorized site and the potentially unauthorized site, and further including comparing proprietary account information associated with a customer of the service center, the customer being associated with the unauthorized site;if the links do not match, then identifying the potentially unauthorized site as not associated with the authorized site. 2. The method of claim 1, wherein generating the first quantitative metrics includes searching an object model of the authorized site for tags, detecting content elements associated with the tags, and generating first hash values of the detected content elements. 3. The method of claim 2, wherein the object model is a Document Object Model and the tags are HTML tags. 4. The method of claim 2, wherein generating the second quantitative metrics includes searching an object model of the potentially unauthorized site for tags, detecting content elements associated with the tags, and generating second hash values of the detected content elements. 5. The method of claim 4, wherein comparing the first and second quantitative metrics includes comparing the first and second hash values. 6. One or more computer-readable storage devices having instructions thereon for executing a method of detecting an unauthorized website in a service center, the method comprising: performing a content-based comparison between a first website and a second website, wherein the content-based comparison includes at least a logo pixel data comparison, wherein the second website is identified as similar to the first website;based on the comparison, automatically determining that the first website has substantially similar content to the second website;comparing source addresses and/or website structure between the first website and the second website and determining that the second website is potentially unauthorized if the content between the first and second websites are substantially similar but at least one of the source addresses is substantially different and/or the website structure is different and/or the logo pixel data is different; andcomparing proprietary account information associated with a first customer of the unauthorized site and a second customer of the authorized site, wherein both the first and second customers are associated with the service center. 7. The computer-readable storage media of claim 6, wherein performing the content-based comparison further comprises: searching for start and end tags in the first website;generating first hash values for content between the start and end tags;searching for start and end tags in the second website;generating second hash values for content between the start and end tags in the second website; andcomparing the first and second hash values. 8. The computer-readable storage media of claim 6, further including generating a dictionary of quantitative metrics for authorized websites by calculating a string of hash values for the authorized websites, wherein the first website is included in the dictionary. 9. The computer-readable storage media of claim 6, further including generating a dictionary of quantitative metrics using machine learning algorithms. 10. The computer-readable storage media of claim 6, further including analyzing a Document Object Model for the first website by searching for tags within the Document Object Model and extracting the content for the first website associated with the tags. 11. The computer-readable storage media of claim 6, wherein performing the content-based comparison further comprises: generating first quantitative metrics for the first website;generating second quantitative metrics for the second website;comparing the first and second quantitative metrics. 12. The computer-readable storage media of claim 11, wherein determining that the first website has substantially similar content to the second website further comprises measuring whether a threshold amount of the first quantitative metrics match the second quantitative metrics. 13. The computer-readable storage media of claim 6, wherein comparing the source addresses between the first website and the second website further includes comparing at least a base part of the source addresses and determining whether the base parts of the source addresses for the second website are not associated with an owner of the first website. 14. The computer-readable storage media of claim 6, wherein performing a content-based comparison between a first website and a second website includes weighting content associated with sensitive information more heavily than other content. 15. The computer-readable storage media of claim 14, wherein the content with sensitive information includes login information associated with receiving a user identification and password. 16. The computer-readable storage media of claim 6, further including generating a plurality of reference values associated with authorized websites including the first website and scanning a plurality of server computers in a virtual environment for the second website to cleanse the virtual environment of unauthorized websites. 17. The computer-readable storage media of claim 6, further including receiving user input on weightings of which content is more important than other content. 18. The computer-readable storage media of claim 6, wherein the content-based comparison includes page composition and style elements.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (4)
Sutton, Michael, Identifying and managing web risks.
Walsh, Paul Fergus, Methods, systems and application programmable interface for verifying the security level of universal resource identifiers embedded within a mobile application.
Walsh, Paul Fergus, Methods, systems and application programmable interface for verifying the security level of universal resource identifiers embedded within a mobile application.
Goldberg, David A.; Howson, David C.; Metzger, Steven W.; Buttry, Daniel A.; Saavedra, Steven Scott, Sensitive and rapid determination of antimicrobial susceptibility.
Hunt, Adam; Pon, David; Kiernan, Chris; Adams, Ben; Edgeworth, Jonas; Manousos, Elias; Linn, Joseph, Using hash signatures of DOM objects to identify website similarity.
Hunt, Adam; Pon, David; Kiernan, Chris; Adams, Ben; Edgeworth, Jonas; Manousos, Elias; Linn, Joseph, Using hash signatures of DOM objects to identify website similarity.
Hunt, Adam; Pon, David; Kiernan, Chris; Adams, Ben; Edgeworth, Jonas; Manousos, Elias; Linn, Joseph, Using hash signatures of DOM objects to identify website similarity.
Stein, Peter; Li, Andrea; Weseley, Tamar; Collins, Jesse; Soylemezoglu, Ali, Using machine learning for classification of benign and malicious webpages.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.