Site of Prof. Dr. Mohammed Zeki Khedher

Arabic Character Recognition using Approximate Stroke Sequence

Arabic Character Recognition using Approximate Stroke Sequence


full paper

——————————————————————————–

Professor Mohammed Zeki Khedher* & Dr. Gheith Abandah**

*Dept. of Electrical Engg khedher@ju.edu.jo

**Dept. of Computer Engg abandah@ju.edu.jo

University of Jordan, Amman – Jordan
1st June 2002

——————————————————————————–

Abstract

Arabic character recognition of handwriting is addressed. A novel approach for the Arabic Character Recognition is presented based on statistical analysis of a typical Arabic text is presented. Results showed that the sub-word in Arabic language is the basic pictorial block rather than the word. The method of approximate stroke sequence is applied for the recognition of some Arabic characters in their stand-alone form. This method could be extended further for more accurate results. It is recommended that research in Arabic OCR systems in the future is based on the basis of the sub-word as the basic block rather than the word.


1.  Introduction

Automatic recognition of handwriting has become a mature discipline at  the beginning of the 21st century. On-line systems are now available on handheld computers with acceptable performance. Off-line systems are less accurate than on-line systems. However, they are now good enough for specialized systems such as interpreting handwritten postal addresses on envelopes and reading currency amounts on bank checks (Plamondon 2000).

The recognition of Arabic characters is particularly difficult due to the necessity of segmentation even for printed text. In order to get an insight into the Arabic word structure, it becomes necessary to do some statistical analysis on some typical Arabic text in order to assess the nature of problems facing the workers on Arabic OCR systems. For this purpose, a reasonable size of Arabic text was selected and analyzed. Based on the results of this analysis, a new procedure is suggested for building Arabic OCR systems. As a first step in the implementation of such systems, recognition of Arabic characters in their stand-alone form is addressed. The method of approximate stroke sequence matching is applied and the results are shown.

The paper gives some literature survey on previous work done in the field. It then gives the main characteristics of Arabic writing, presenting the importance of the sub-word structure of the Arabic word, showing the statistical results proving this phenomena, and proposing a new procedure for Arabic OCR system. The newly proposed method suggests the treatment of the sub-word as the basic block in the recognition of Arabic characters. The size of the sub-word should be treated as a decisive factor in the method of recognition of the characters contained in the sub-word.  The method of approximate stroke sequence matching is described and then applied to an example of unknown character and compared with two standard characters. A text containing different shapes of Arabic characters was written by 48 different persons and samples of these characters under test were copied for this study. Some results of applying this procedure onto different characters is given. The paper discusses the results obtained and ends up with some conclusions and suggestions for future work.

Comments are closed.