TY - GEN
T1 - Meeting the Needs of Low-Resource Languages
T2 - 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023
AU - Ebrahimi, Abteen
AU - McCarthy, Arya D.
AU - Oncevay, Arturo
AU - Chiruzzo, Luis
AU - Ortega, John E.
AU - Giménez-Lugo, Gustavo A.
AU - Coto-Solano, Rolando
AU - Kann, Katharina
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribute gold-standard alignments for Bribri-Spanish, Guarani-Spanish, Quechua-Spanish, and Shipibo-Konibo-Spanish. With these, we evaluate state-of-the-art aligners with and without model adaptation to the target language. Finally, we also evaluate the resulting alignments extrinsically through two downstream tasks: named entity recognition and part-of-speech tagging. We find that although transformer-based methods generally outperform traditional models, the two classes of approach remain competitive with each other.
AB - Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribute gold-standard alignments for Bribri-Spanish, Guarani-Spanish, Quechua-Spanish, and Shipibo-Konibo-Spanish. With these, we evaluate state-of-the-art aligners with and without model adaptation to the target language. Finally, we also evaluate the resulting alignments extrinsically through two downstream tasks: named entity recognition and part-of-speech tagging. We find that although transformer-based methods generally outperform traditional models, the two classes of approach remain competitive with each other.
UR - http://www.scopus.com/inward/record.url?scp=85159851251&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85159851251
T3 - EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
SP - 3894
EP - 3908
BT - EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
Y2 - 2 May 2023 through 6 May 2023
ER -