END-TO-END SYSTEM FOR EXTRACTING TABULAR DATA PRESENT IN ELECTRONIC DOCUMENTS AND METHOD THEREOF

The present disclosure describes a method, system, and a computer readable medium for extracting tabular data present in a document. The method comprises detecting presence of at least one table in the document using a deep learning based model and a statistical method. The method further comprises...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: PERIYAKARUPPAN, Nandhinee, GOYAL, Anil, KRISHNAMOORTHY, Harinath, SANTHIAPPAN, Sudarsun
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The present disclosure describes a method, system, and a computer readable medium for extracting tabular data present in a document. The method comprises detecting presence of at least one table in the document using a deep learning based model and a statistical method. The method further comprises identifying a type of the table based on determining a count of horizontal and vertical lines, presence of outer borders, and presence of row-column intersections in the table. The type of the table comprises a bordered table, a partially bordered table, or a borderless table. The method further comprises processing the detected table, depending on its type, to identify one or more cells present in the table. The method further comprises generating an output file by extracting the tabular data present in the table, where the extracting comprises performing optical character recognition on the identified one or more cells.