Associative data bases |
The classification of handwritten characters often leads to ambiguities,
i.e. a certain character can have different possible interpretations,
e.g. "1" and "7" often look alike and cannot be distinguished by the
classifier.
Using context information these ambiguities can usually be solved.
If, in the case above, the digit is part of an account number and
only one of the two possible digit strings results in an existing
account number a decision can be made. If both possible account
numbers exist, the name of the account owners can be compared
and the best match decides.
Based on our recognizer for handwritten characters we thus began to
develop the associative data base DACCORD in summer 1995. The goal of
this project was an efficient portable application that shows high
performance rates using large data bases even on standard PCs.
The developed system could soon be installed in one of Germany's
leading banks where up to 10 credit transfer forms per second can be
disambiguated using a data base of approximately 2.5 million bank
accounts. The system runs on a PC with Microsoft Windows NT.
Ambiguous recognizer output from bad handwriting causes a
significantly enlarged search space. In our system data bases of a size of up
to 2 GByte can now be searched efficiently using fast hash algorithms.
The comparison of the recognizer output and the data base entries
is based on the Levenshtein distance that measures the
similarity of two character strings: The distance measure roughly
corresponds to the number of character insertions, deletions and
replacements that have to be applied to the first string to make it
equal to the second string. By applying different costs to these
different operations the measure can be tailored to the specific
needs of the application. In our system we use a combined single word
matching by summing up the Levenshtein distances for each single word
of the search string, weighted by the word length.
|