Guide to OCR for Indic Scripts: Document Recognition and RetrievalVenu Govindaraju, Srirangaraj (Ranga) Setlur This is the first comprehensive text on Optical Character Recognition for Indic scripts. It covers many topics and describes OCR systems for eight different scripts—Bangla, Devanagari, Gurmukhi, Gujarti, Kannada, Malayalam, Tamil and Urdu. |
Contents
3 | |
Bangla and Devanagari | 26 |
A Complete MachinePrinted Gurmukhi OCR System | 43 |
Progress in Gujarati Document Processing and Character Recognition | 73 |
Design of a Bilingual KannadaEnglish OCR | 96 |
Recognition of Malayalam Documents | 125 |
A Complete OCR System for Tamil Magazine Documents | 147 |
Experiments on Urdu Text Recognition | 163 |
Generalization of Hindi OCR Using Adaptive Segmentation and Font Files | 181 |
Online Handwriting Recognition for Indic Scripts | 208 |
Part II Retrieval of Indic Documents | 235 |
Enhancing Access to Primary Cultural Heritage Materials of India | 237 |
Digital Image Enhancement of Indic Historical Manuscripts | 248 |
GFGBased Compression and Retrieval of Document Images in Indian Scripts | 269 |
Word Spotting for Indic Documents to Facilitate Retrieval | 285 |
Indian Language Information Retrieval | 300 |
The BBN Byblos Hindi OCR System | 173 |
Colour Plates | 315 |
Other editions - View all
Guide to OCR for Indic Scripts: Document Recognition and Retrieval Venu Govindaraju,Srirangaraj Ranga Setlur No preview available - 2012 |
Guide to OCR for Indic Scripts: Document Recognition and Retrieval Venu Govindaraju,Srirangaraj Ranga Setlur No preview available - 2009 |
Common terms and phrases
Akshara algorithm alignment Analysis and Recognition annotation approach Arabic background Bangla binarization black pixels bounding box character image character recognition character segmentation classifier combination computed Conference on Document connected components consonant corpus detection dictionary Document Analysis document image English error evaluation feature extraction feature vector font Gabor filters glyphs Govindaraju graph graphemes Gujarati Gujarati script Gurmukhi Handwriting Recognition handwritten headline Hindi horizontal ICDAR IEEE Indian languages Indic scripts input International Conference Kannada keyword spotting labels large number layout lower zone Malayalam manuscripts matching method middle zone neural network number of classes OCR system optical character recognition output Pattern Recognition pixel preprocessing printed recognized representation samples Sanskrit scanned shape shirorekha shown in Fig skew strip stroke structure sub-symbol support vector machines symbols Table Tamil techniques Technology Telugu template text blocks text line tion Unicode upper zone Urdu vertical vowel modifiers wavelet word image