Automated document extraction and classification

A method including receiving a source file containing a plurality of documents which, to a computer, initially are indistinguishable from each other. A first classification stage is applied to the source file using a convolutional neural network image classification to identify source documents in t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Patchirajan, Karpaga Ganesh, Shamanna, Ram Mohan, Subramanian, Trichur Shivaramakrishnan, Douthit, Ronnie Douglas, Reddy, Chiranjeev Jagannadha, Mohapatra, Deepankar, Huang, Yexin, Mattey, Amar J, Duraisami, Chinnadurai
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method including receiving a source file containing a plurality of documents which, to a computer, initially are indistinguishable from each other. A first classification stage is applied to the source file using a convolutional neural network image classification to identify source documents in the multitude of documents and to produce a partially parsed file having a multitude of identified source documents. The partially parsed file includes sub-images corresponding to the plurality of identified source documents. A second classification stage, including a natural language processing artificial intelligence, is applied to sets of text in bounding boxes of the sub-images, to classify each of the multitude of identified source documents as a corresponding sub-type of document. Each of the sets of text corresponding to one of the sub-images. A parsed file having a multitude of identified sub-types of documents is produced. The parsed file is further computer processed.