TY - GEN
T1 - Findings of the AmericasNLP 2024 Shared Task on Machine Translation into Indigenous Languages
AU - Ebrahimi, Abteen
AU - de Gibert, Ona
AU - Vázquez, Raúl
AU - Coto-Solano, Rolando
AU - Denisov, Pavel
AU - Pugh, Robert
AU - Mager, Manuel
AU - Oncevay, Arturo
AU - Chiruzzo, Luis
AU - von der Wense, Katharina
AU - Rijhwani, Shruti
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - This paper presents the findings of the third iteration of the AmericasNLP Shared Task on Machine Translation. This year’s competition features eleven Indigenous languages found across North, Central, and South America. A total of six teams participate with a total of 157 submissions across all languages and models. Two baselines – the Sheffield and Helsinki systems from 2023 – are provided and represent hard-to-beat starting points for the competition. In addition to the baselines, teams are given access to a new repository of training data which consists of data collected by teams in prior shared tasks. Using ChrF++ as the main competition metric, we see improvements over the baseline for 4 languages: Chatino, Guarani, Quechua, and Rarámuri, with performance increases over the best baseline of 4.2 ChrF++. In this work, we present a summary of the submitted systems, results, and a human evaluation of system outputs for Bribri, which consists of both (1) a rating of meaning and fluency and (2) a qualitative error analysis of outputs from the best submitted system.
AB - This paper presents the findings of the third iteration of the AmericasNLP Shared Task on Machine Translation. This year’s competition features eleven Indigenous languages found across North, Central, and South America. A total of six teams participate with a total of 157 submissions across all languages and models. Two baselines – the Sheffield and Helsinki systems from 2023 – are provided and represent hard-to-beat starting points for the competition. In addition to the baselines, teams are given access to a new repository of training data which consists of data collected by teams in prior shared tasks. Using ChrF++ as the main competition metric, we see improvements over the baseline for 4 languages: Chatino, Guarani, Quechua, and Rarámuri, with performance increases over the best baseline of 4.2 ChrF++. In this work, we present a summary of the submitted systems, results, and a human evaluation of system outputs for Bribri, which consists of both (1) a rating of meaning and fluency and (2) a qualitative error analysis of outputs from the best submitted system.
UR - http://www.scopus.com/inward/record.url?scp=85210398215&partnerID=8YFLogxK
U2 - 10.18653/v1/2024.americasnlp-1.28
DO - 10.18653/v1/2024.americasnlp-1.28
M3 - Conference contribution
AN - SCOPUS:85210398215
T3 - AmericasNLP 2024 - 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas - Proceedings of the Workshop
SP - 236
EP - 246
BT - AmericasNLP 2024 - 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas - Proceedings of the Workshop
A2 - Mager, Manuel
A2 - Ebrahimi, Abteen
A2 - Rijhwani, Shruti
A2 - Oncevay, Arturo
A2 - Chiruzzo, Luis
A2 - Pugh, Robert
A2 - von der Wense, Katharina
A2 - von der Wense, Katharina
PB - Association for Computational Linguistics (ACL)
T2 - 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2024
Y2 - 21 June 2024
ER -