Identification and removal of duplicate event records from a security information and event management database
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-017/00
G06F-017/30
H04L-029/06
출원번호
US-0078375
(2016-03-23)
등록번호
US-10108634
(2018-10-23)
발명자
/ 주소
Pal, Susam
출원인 / 주소
EMC IP Holding Company LLC
대리인 / 주소
Ryan, Mason & Lewis, LLP
인용정보
피인용 횟수 :
0인용 특허 :
3
초록▼
A method comprises receiving information characterizing events from respective ones of a plurality of network devices each comprising one or more event sources, storing event records in a security information and event management database with each event record corresponding to a given event and com
A method comprises receiving information characterizing events from respective ones of a plurality of network devices each comprising one or more event sources, storing event records in a security information and event management database with each event record corresponding to a given event and comprising a device identifier, an event source name, an event time and an event record number, obtaining a set of event records from the security information and event management database for a specified network device in a specified time range, identifying whether respective ones of the event records in the set are duplicate event records based at least in part on mappings of event time and event record number values to ordered pairs of device identifier and event source name values, and removing event records in the set identified as duplicate event records from the security information and event management database.
대표청구항▼
1. A method comprising: receiving information characterizing one or more events from respective ones of a plurality of network devices, each network device comprising one or more event sources;storing one or more event records in a security information and event management database, each event recor
1. A method comprising: receiving information characterizing one or more events from respective ones of a plurality of network devices, each network device comprising one or more event sources;storing one or more event records in a security information and event management database, each event record corresponding to a given one of the events and comprising a device identifier, an event source name, an event time and an event record number;obtaining a set of event records from the security information and event management database for a specified one of the network devices in a specified time range;identifying whether respective ones of the event records in the set are duplicate event records based at least in part on mappings of event time and event record number values to ordered pairs of device identifier and event source name values; andremoving event records in the set identified as duplicate event records from the security information and event management database;wherein the method is performed by at least one processing device comprising a processor coupled to a memory, the at least one processing device being connected to the plurality of network devices and the security information and event management database over at least one network. 2. The method of claim 1 wherein: a time required for identifying whether respective ones of the event records in the set are duplicate event records and removing the event records in the set identified as duplicate event records increases linearly with a total number of event records in the set; andstorage space required for identifying whether respective ones of the event records in the set are duplicate event records and removing the event records in the set identified as duplicate event records is independent of the total number of event records in the set and does not exceed a specified upper bound. 3. The method of claim 1 wherein receiving information characterizing one or more events from respective ones of the plurality of network devices comprises receiving multiple streams of event information from respective ones of the plurality of network devices, each stream of event information comprising at least one of text and binary data relating to one or more events. 4. The method of claim 1 wherein the record numbers for non-duplicate events in the set of event records increment as a function of event time and wherein record numbers for an event source may be reset to an initial value. 5. The method of claim 1 wherein storing the one or more event records in the security information and event management database comprises storing the event records such that an event record for a chronologically first event associated with the specified network device in the specified time range can be accessed in constant time and event records for all subsequent events associated with the specified network device in the specified time range can be accessed in linear time with respect to a total number of event records in the security information and event management database for the specified network device in the specified time range. 6. The method of claim 1 wherein storing the one or more event records in the security information and event management database comprises utilizing a container hierarchy with respective containers indexing events by device identifier and event time, each container in the container hierarchy being implemented as one of: a directory on a filesystem of the security information and event management database;an archive file in the directory;a data file in the directory or the archive file;a section in the data file, wherein the data file comprises a header defining an index with locations of a beginning of each section. 7. The method of claim 1 wherein storing the one or more event records in the security information and event management database comprises: reading the received information characterizing the one or more events to identify device identifiers, event source names, event times and event record numbers for respective ones of the events; andembedding device identifiers, event source names, event times and record numbers for respective events in the event records. 8. The method of claim 7 wherein embedding device identifiers, event source names, event times and record numbers for respective events in the event records comprises implicitly embedding the device identifiers, event source names, event times and record numbers based on storage of the event records in a container hierarchy utilized by the security information and event management database. 9. The method of claim 1 wherein the mappings of event time and event record number values to ordered pairs of device identifier and event source name values are stored in an associative container comprising a hash table. 10. The method of claim 1 wherein the mappings of event time and event record number values to ordered pairs of device identifier and event source name values comprise mappings between: a first set of ordered pairs of device identifier and event source name values; anda second set of ordered pairs of event time and event record number values. 11. The method of claim 1 wherein: identifying whether respective ones of the event records in the set are duplicate event records comprises identifying a given event record as a duplicate event record by comparing the event time for the given event record to a known event time and comparing the record number of the given event record to a known record number;the known event time comprises a maximum event time for the ordered pair of device identifier and event source name values of the given event record in previously-obtained event records for the specified network device in the specified time range; andthe known record number comprises a maximum record number for the ordered pair of device identifier and event source name values of the given event record in previously-obtained event records for the specified network device in the specified time range. 12. The method of claim 1 wherein obtaining the set of event records from the security information and event management database for the specified network device in the specified time range comprises traversing a container hierarchy of the security information and event management database to identify a given time-based event container matching a start time of the specified time range or a first available time-based event container in the specified time range. 13. The method of claim 12 wherein identifying whether respective ones of the event records in the set are duplicate event records comprises, for a given event record in the set of event records: reading the given event record to identify d, s, t and n for a given event associated with the given event record, where d is the device identifier for the given event, s is the event source name for the given event, t is the event time for the given event and n is the record number for the given event;determining that the given event record is a duplicate event record if an associative container h contains a mapping of a key k=(d, s) to an ordered pair of a known event time and a known record number and at least one of: t is less than the known event time for key k; andt is equal to the known event time for key k and n is less than or equal to the known record number for key k; andotherwise determining that the given event record is not a duplicate event record. 14. The method of claim 13 wherein removing event records in the set identified as duplicate event records comprises, for the given event record determined not to be a duplicate event record: setting the value of key k for the given event record in the associative container h to the ordered pair (t, n) for the given event record, where t is an updated known event time for key k and n is an updated known record number for key k;copying the given event record to a deduplicated temporary event container file; andreplacing an original event container file stored in the security information and event management database with the deduplicated temporary event container file responsive to at least one of: determining that the given event record is a last event in the original event container file and that the storage space consumed by the deduplicated temporary event container file exceeds M−m, where m represents an upper bound on storage space of an event container file, M represents an upper bound on storage space allocated for deduplication of event records, and M≥2m; anddetermining that the event record is a last event for the specified network device in the specified time range. 15. The method of claim 1 further comprising, in parallel with identifying whether respective ones of the event records in the set are duplicate event records, identifying whether respective ones of event records in one or more other sets of event records for other specified network devices and specified time ranges are duplicate event records. 16. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device cause the at least one processing device: to receive information characterizing one or more events from respective ones of a plurality of network devices, each network device comprising one or more event sources;to store one or more event records in a security information and event management database, each event record corresponding to a given one of the events and comprising a device identifier, an event source name, an event time and an event record number;to obtain a set of event records from the security information and event management database for a specified one of the network devices in a specified time range;to identify whether respective ones of the event records in the set are duplicate event records based at least in part on mappings of event time and event record number values to ordered pairs of device identifier and event source name values; andto remove event records in the set identified as duplicate event records from the security information and event management database. 17. The computer program product of claim 16 wherein: a time required for identifying whether respective ones of the event records in the set are duplicate event records and removing the event records in the set identified as duplicate event records increases linearly with a total number of event records in the set; andstorage space required for identifying whether respective ones of the event records in the set are duplicate event records and removing the event records in the set identified as duplicate event records is independent of the total number of event records in the set and does not exceed a specified upper bound. 18. An apparatus comprising: at least one processing device comprising a processor coupled to a memory and implementing a security information and event management system;the security information and event management system being configured: to receive information characterizing one or more events from respective ones of a plurality of network devices, each network device comprising one or more event sources;to store one or more event records in a security information and event management database, each event record corresponding to a given one of the events and comprising a device identifier, an event source name, an event time and an event record number;to obtain a set of event records from the security information and event management database for a specified one of the network devices in a specified time range;to identify whether respective ones of the event records in the set are duplicate event records based at least in part on mappings of event time and event record number values to ordered pairs of device identifier and event source name values; andto remove event records in the set identified as duplicate event records from the security information and event management database. 19. The apparatus of claim 18 wherein the security information and event management database comprises a distributed security information and event management database comprising two or more storage nodes connected over at least one network. 20. The apparatus of claim 19 wherein: a time required for identifying whether respective ones of the event records in the set are duplicate event records and removing the event records in the set identified as duplicate event records increases linearly with a total number of event records in the set; andstorage space required for identifying whether respective ones of the event records in the set are duplicate event records and removing the event records in the set identified as duplicate event records is independent of the total number of event records in the set and does not exceed a specified upper bound.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (3)
Farber, David A.; Lachman, Ronald D., De-duplication of data in a data processing system.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.