SELF-IMPROVING DOCUMENT CLASSIFICATION AND SPLITTING FOR DOCUMENT PROCESSING IN ROBOTIC PROCESS AUTOMATION
Systems and methods for classifying and splitting an electronic file into a plurality of extracted documents are provided. The electronic file is received. An initial portion of the electronic file is classified using a trained classifier and extracted from the electronic file as an extracted docume...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Systems and methods for classifying and splitting an electronic file into a plurality of extracted documents are provided. The electronic file is received. An initial portion of the electronic file is classified using a trained classifier and extracted from the electronic file as an extracted document associated with the classification. It is iteratively determined whether each respective next portion of the electronic file should be added to the extracted document until it is determined that the respective next portion should not be added to the extracted document. In response to determining that the respective next portion should be added to the extracted document, the respective next portion is extracted from the electronic file and added to the extracted document. In response to determining that the respective next portion should not be added to the extracted document, the classifying and the iteratively determining are repeated using the respective next portion as the initial portion. The extracted documents are output. The trained classifier can be trained to learn sets of word vectors and other relevant information associated with document classifications, in order to improve accuracy. |
---|