The line segmentation algorithm of Indonesian electronic identity card (e-KTP) for data digitization

The Indonesian Electronic Identity Card (e-KTP) become a source of information for its owner identity which has a lot of use in administrative purpose. The biodata segment of e-KTP consisted of multiple lines, each of the lines is unique in terms of length and wide which is become a problem in digit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Afifah, Yasmine, Sujono, Augustinus, Apribowo, Chico Hermanu Brillianto
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The Indonesian Electronic Identity Card (e-KTP) become a source of information for its owner identity which has a lot of use in administrative purpose. The biodata segment of e-KTP consisted of multiple lines, each of the lines is unique in terms of length and wide which is become a problem in digitizing data using Optical Character Recognition (OCR). Therefore, line segmentation algorithm must be applied, this research proposed the line segmentation algorithm using rectangular cropping and Tesseract OCR. First, the algorithm cropped the owner biodata and the line indicator. There are three line indicators, which are below the ‘alamat’,’tempat/tanggal lahir’ and ‘nama’ area. Then, OCR reads all of the cropped area. If the line indicator value is blank, then those segment known has two lines. The OCR result is converted into an array which is separated by lines. The algorithm exercised into four different conditions which are, e-KTP with two lines of address; two lines of date and place of birth; two lines of name; and one lines for every segment. Result of the applied algorithm manage to reach 85% from 30 samples. Failure in line segmentation is caused by a threshold value that is not optimal.
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0000670