Duplicate filtering in a data processing environment
A data processing method is provided. The method comprises collecting a stream of data records received from one or more data sources connected in a communications network; dividing the stream of data records into sets of data records for parallel processing by a plurality of concurrently running ta...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A data processing method is provided. The method comprises collecting a stream of data records received from one or more data sources connected in a communications network; dividing the stream of data records into sets of data records for parallel processing by a plurality of concurrently running tasks, wherein a first task loads a persistent index associated with a first set of data records into memory to generate an in-memory version of the first persistent index for the first set of data records; and identifying duplicate and non-duplicate data records in the first set of data records, based on searching the in-memory version of the first persistent index. |
---|