LEVERAGING A COLLECTION OF TRAINING TABLES TO ACCURATELY PREDICT ERRORS WITHIN A VARIETY OF TABLES

The present disclosure relates to systems, methods, and computer-readable media for using a variety of hypothesis tests to identify errors within tables and other structured datasets. For example, systems disclosed herein can generate a modified table from an input table by removing one or more entr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	WANG PEI, HE YEYE
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The present disclosure relates to systems, methods, and computer-readable media for using a variety of hypothesis tests to identify errors within tables and other structured datasets. For example, systems disclosed herein can generate a modified table from an input table by removing one or more entries from the input table. The systems disclosed herein can further leverage a collection of training tables to determine probabilities associated with whether the input table and modified table are drawn from the collection of training tables. The systems disclosed herein can additionally compare the probabilities to accurately determine whether the one or more entries include errors therein. The systems disclosed herein may apply to a variety of different sizes and types of tables to identify different types of common errors within input tables. 本公开涉及用于使用各种假设测试来识别表格和其他结构化数据集内的错误的系统、方法、以及计算机可读介质。例如，本文中公开的系统可以通过从输入表格中移除一个或多个条目来从输入表格生成修改的表格。本文中公开的系统还可以利用训练表格的集合来确定与输入表格和修改的表格是否是从训练表格的集合中抽取相关联的概率。本文中公开的系统可以另外比较概率，以准确地确