STATISTICAL ERROR REDUCTION IN CHARACTER RECOGNITION SYSTEMS

1,236,455. Character recognition. INTERNATIONAL BUSINESS MACHINES CORP. 30 Aug., 1968 [8 Sept., 1967], No. 41418/68, Heading G4R. Word classifying apparatus for use with a character reader comprises an error word generator for generating, from an input word, error words into which the reader might c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ROBERT B. HENNIS, ARTHUR HAMBURGEN, THERON FOSDICK
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:1,236,455. Character recognition. INTERNATIONAL BUSINESS MACHINES CORP. 30 Aug., 1968 [8 Sept., 1967], No. 41418/68, Heading G4R. Word classifying apparatus for use with a character reader comprises an error word generator for generating, from an input word, error words into which the reader might change the input word when reading, and the probability of each change, a ratio calculator for calculating the ratio of the frequency of the usage of an error word as a legitimate word to the probability of an input word being changed into it, and classifying means for classifying each input word and error word in accordance with the output of the ratio calculator. A confusion pair file 12 holds a series of pairs of letters which a character reader is likely to confuse, i.e. recognize the-first of the pair as the second, each pair being accompanied by the probability P of the confusion. Each name in a file 10 of common names is taken in turn and each letter in turn is compared against the first letter of each pair from file 12. On equality, an error name is generated from the common name by replacing the letter giving equality with the second letter of the pair. The error name is compared at 16 with the names in a file 18 to determine if it is a legitimate name in its own right, and if it is, a ratio calculator 20 calculates the ratio of N L , the number of occurrences of the error name as a legitimate name in a population, read from file 18, over N E , the number of times the error name would be produced in mistake for the common name. N E is obtained by multiplying the probability P of letter confusion, from file 12, by the number N c of occurrences of the common name in the population, from file 10. In order to use the above results to replace some names from a character reader by statistically more likely names before feeding them to an output, and mark all output names either " accept " or " reject ", the error names are sent to a file 26 via a register 24, each error name being followed by the corresponding common name from file 10 if replacement of the former by the latter will be required. Each common name from file 10 is also sent. The names are accompanied by " replace " and " accept/reject " tag bits set by a classifier 22 under control of the ratio &c. to indicate: (a) where a common name has a corresponding error name but the latter is not a legitimate name in its own right, that the error name is to be replaced by the common name and the output mark