Automated document extraction and classification
A method including receiving a source file containing a plurality of documents which, to a computer, initially are indistinguishable from each other. A first classification stage is applied to the source file using a convolutional neural network image classification to identify source documents in t...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method including receiving a source file containing a plurality of documents which, to a computer, initially are indistinguishable from each other. A first classification stage is applied to the source file using a convolutional neural network image classification to identify source documents in the multitude of documents and to produce a partially parsed file having a multitude of identified source documents. The partially parsed file includes sub-images corresponding to the plurality of identified source documents. A second classification stage, including a natural language processing artificial intelligence, is applied to sets of text in bounding boxes of the sub-images, to classify each of the multitude of identified source documents as a corresponding sub-type of document. Each of the sets of text corresponding to one of the sub-images. A parsed file having a multitude of identified sub-types of documents is produced. The parsed file is further computer processed. |
---|