Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models

Abteen Ebrahimi, Arya D. McCarthy, Arturo Oncevay, Luis Chiruzzo, John E. Ortega, Gustavo A. Giménez-Lugo, Rolando Coto-Solano, Katharina Kann

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribute gold-standard alignments for Bribri-Spanish, Guarani-Spanish, Quechua-Spanish, and Shipibo-Konibo-Spanish. With these, we evaluate state-of-the-art aligners with and without model adaptation to the target language. Finally, we also evaluate the resulting alignments extrinsically through two downstream tasks: named entity recognition and part-of-speech tagging. We find that although transformer-based methods generally outperform traditional models, the two classes of approach remain competitive with each other.

Idioma originalInglés
Título de la publicación alojadaEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
EditorialAssociation for Computational Linguistics (ACL)
Páginas3894-3908
Número de páginas15
ISBN (versión digital)9781959429449
EstadoPublicada - 2023
Publicado de forma externa
Evento17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Dubrovnik, Croacia
Duración: 2 may. 20236 may. 2023

Serie de la publicación

NombreEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference

Conferencia

Conferencia17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023
País/TerritorioCroacia
CiudadDubrovnik
Período2/05/236/05/23

Huella

Profundice en los temas de investigación de 'Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models'. En conjunto forman una huella única.

Citar esto