Skip to main navigation menu Skip to main content Skip to site footer
×
Español (España) | English
Editorial
Home
Indexing
Original

Build a Trained Data of Tesseract OCR engine for Tifinagh Script Recognition

By
Ali Benaissa ,
Ali Benaissa

ENSAH, Laboratory of Applied Science - Data Science and Competitive Intelligence Team (DSCI), Abdelmalek Essaadi University (UAE), Tetouan, Morocco. The National School of Management Tangier, Governance and Performance of Organizations laboratory - Finance and Governance of Organizations team, Abdelmalek Essaadi University, Tangier, Morocco

Search this author on:

PubMed | Google Scholar
Abdelkhalak Bahri ,
Abdelkhalak Bahri

ENSAH, Laboratory of Applied Science - Data Science and Competitive Intelligence Team (DSCI), Abdelmalek Essaadi University (UAE), Tetouan, Morocco

Search this author on:

PubMed | Google Scholar
Ahmad El Allaoui ,
Ahmad El Allaoui

Faculty of Sciences and Techniques Errachidia, Engineering Sciences and Techniques Laboratory - Decisional Computing and Systems Modelling Team, Moulay Ismail University of Meknes, Morocco

Search this author on:

PubMed | Google Scholar
My Abdelouahab Salahddine ,
My Abdelouahab Salahddine

The National School of Management Tangier, Governance and Performance of Organizations laboratory - Finance and Governance of Organizations team, Abdelmalek Essaadi University, Tangier, Morocco

Search this author on:

PubMed | Google Scholar

Abstract

This article introduces a methodology for constructing a trained dataset to facilitate Tifinagh script recognition using the Tesseract OCR engine. The Tifinagh script, widely used in North Africa, poses a challenge due to the lack of built-in recognition capabilities in Tesseract. To overcome this limitation, our approach focuses on image generation, box generation, manual editing, charset extraction, and dataset compilation. By leveraging Python scripting, specialized software tools, and Tesseract's training utilities, we systematically create a comprehensive dataset for Tifinagh script recognition. The dataset enables the training and evaluation of machine learning models, leading to accurate character recognition. Experimental results demonstrate high accuracy, precision, recall, and F1 score, affirming the effectiveness of the dataset and its potential for practical applications. The results highlight the robustness of the OCR system, achieving an outstanding accuracy rate of 99.97%. The discussion underscores its superior performance in Tifinagh character recognition, exceeding the findings in the field. This methodology contributes significantly to enhancing OCR technology capabilities and encourages further research in Tifinagh script recognition, unlocking the wealth of information contained in Tifinagh documents.

How to Cite

1.
Benaissa A, Bahri A, El Allaoui A, Abdelouahab Salahddine M. Build a Trained Data of Tesseract OCR engine for Tifinagh Script Recognition. Data and Metadata [Internet]. 2023 Dec. 9 [cited 2024 May 17];2:185. Available from: https://dm.saludcyt.ar/index.php/dm/article/view/185

The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.

Article metrics

Google scholar: See link

Metrics

Metrics Loading ...

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.