Prof. A G Ramakrishnan and Dr. Shiva Kumar H R develop a Kannada OCR that beats Google’s Tesseract OCR

ACM Transactions on Asian and Low-Resource Language Information Processing (ACM-TALLIP) has accepted this work for publication. They have shown conclusively how their Lipi Gnani OCR beats Google’s latest version of the Tesseract OCR for Kannada, on the benchmark datasets on Github. Lipi Gnani is more than thrice as fast and marginally better in accuracy. One of the reviewers has said that this paper can be considered as a benchmark for Kannada document recognition and recommended it for the best paper.

This OCR has now been licensed to RaGaVeRa Indic Technologies by the Indian Institute of Science. Another startup, BhaShiNi Digitization Services, is using Lipi Gnani to digitize hundreds of Kannada books by famous authors into e-books that can be read on Amazon Kindle, etc. This startup has been funded for this work by the Karnataka Startup Cell. And they are successfully running the OCR on Raspberry Pi processor !

Reference : Prof. A G Ramakrishnan and Dr. Shiva Kumar H R, “Lipi Gnani – A Versatile OCR for Documents in any Language Printed in Kannada Script”, ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 1, No. 1, Article 1, February 2020. (pdf)