Combining Data-driven Supervision with Human-in-the-loop Feedback for Entity Resolution
The distribution gap between training datasets and data encountered in production is well acknowledged. Training datasets are often constructed over a fixed period of time and by carefully curating the data to be labeled. Thus, training datasets may not contain all possible variations of data that c...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The distribution gap between training datasets and data encountered in
production is well acknowledged. Training datasets are often constructed over a
fixed period of time and by carefully curating the data to be labeled. Thus,
training datasets may not contain all possible variations of data that could be
encountered in real-world production environments. Tasked with building an
entity resolution system - a model that identifies and consolidates data points
that represent the same person - our first model exhibited a clear
training-production performance gap. In this case study, we discuss our
human-in-the-loop enabled, data-centric solution to closing the
training-production performance divergence. We conclude with takeaways that
apply to data-centric learning at large. |
---|---|
DOI: | 10.48550/arxiv.2111.10497 |