Spanish Historical Handwritten Text Recognition with Deep Learning

Gustavo Jorge Choque Dextre, César Beltrán Castañón, Ferdinand Pineda Ancco

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

The recognition of historical texts presents a significant challenge due to a range of factors, such as the physical deterioration of manuscripts and the diverse, complex writing styles commonly found in these documents. These factors complicate the accurate interpretation and processing of historical documents. In recent years, numerous handwritten text recognition (HTR) models have been developed, targeting a variety of languages including English, Chinese, Arabic and Japanese, among others. Despite of this progress, there has been a notable lack of HTR initiatives specially focused on the Spanish language, mainly due to the scarcity of publicly available datasets that could support the development of solution for this specific language. This publication presents the application of Deep Learning techniques based on an Encoder-Decoder Neural Network architecture and Gated Convolutional Neural Networks (Gated-CNN), which in recent years have demonstrated outstanding results in addressing this problem. Additionally, the application of Transfer Learning is employed to improve the accuracy of recognition of historical texts in Spanish. The experiments show that the application of these methods can provide outstanding results, in addition the application of other techniques such as Data Augmentation and N-gram Language Models lead to significant improvements in the results. The use of a new dataset of historical texts in Spanish is also proposed, made up of 1000 elements taken from Peruvian historical texts referring to the 18th century.

Idioma originalInglés
Título de la publicación alojadaInformation Management and Big Data - 11th Annual International Conference, SIMBig 2024, Proceedings
EditoresJuan Antonio Lossio-Ventura, Eduardo Ceh-Varela, Eduardo Díaz, Freddy Paz Espinoza, Claude Tadonki, Hugo Alatrista-Salas
EditorialSpringer Science and Business Media Deutschland GmbH
Páginas329-341
Número de páginas13
ISBN (versión impresa)9783031914270
DOI
EstadoPublicada - 2025
Evento11th Annual International Conference on Information Management and Big Data, SIMBig 2024 - Ilo, Perú
Duración: 20 nov. 202422 nov. 2024

Serie de la publicación

NombreCommunications in Computer and Information Science
Volumen2496 CCIS
ISSN (versión impresa)1865-0929
ISSN (versión digital)1865-0937

Conferencia

Conferencia11th Annual International Conference on Information Management and Big Data, SIMBig 2024
País/TerritorioPerú
CiudadIlo
Período20/11/2422/11/24

Huella

Profundice en los temas de investigación de 'Spanish Historical Handwritten Text Recognition with Deep Learning'. En conjunto forman una huella única.

Citar esto