A Strategy for Selecting Classes of Symbols from Classes of Graphemes in HMM-Based Handwritten Word Recognition

Cinthia O. A. Freitas, Flávio Bortolozzi, Robert Sabourin
DOI: https://doi.org/10.21529/RESI.2004.0301003

Abstract

This paper presents a new strategy for selecting classes of symbols from classes of graphemes in HMM-based handwritten word recognition from Brazilian legal amounts. This paper discusses features, graphemes and symbols, as our baseline system is based on a global approach in which the explicit segmentation of words into letters or pseudo-letters is avoided and HMM models are used. For this framework, the input data are the symbols of an alphabet based on graphemes extracted from the word images visible on the Hidden Markov Model. The idea is to introduce high-level concepts, such as perceptual features (loops, ascenders, descenders, concavities and convexities) and to provide fast and informative feedback about the information contained in each class of grapheme for symbol class selection. The paper presents an algorithm based on Mutual Information and HMM working in the same evaluation process. Finally, the experimental results demonstrate that it is possible to select from the “original” grapheme set (composed of 94 graphemes) an alphabet of symbols (composed of 29 symbols). We conclude that the discriminating power of the grapheme is very important for consolidating an alphabet of symbols.


Keywords

Features; Mutual Information; HMM; Handwritten Word Recognition


Compartilhe