TY - GEN
T1 - AmericasNLI
T2 - 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
AU - Ebrahimi, Abteen
AU - Mager, Manuel
AU - Oncevay, Arturo
AU - Chaudhary, Vishrav
AU - Chiruzzo, Luis
AU - Fan, Angela
AU - Ortega, John E.
AU - Ramos, Ricardo
AU - Rios, Annette
AU - Meza-Ruiz, Ivan
AU - Giménez-Lugo, Gustavo A.
AU - Mager, Elisabeth
AU - Neubig, Graham
AU - Palmer, Alexis
AU - Coto-Solano, Rolando
AU - Vu, Ngoc Thang
AU - Kann, Katharina
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, an extension of XNLI (Conneau et al., 2018) to 10 Indigenous languages of the Americas. We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches. Additionally, we explore model adaptation via continued pretraining and provide an analysis of the dataset by considering hypothesis-only models. We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.48%. Continued pretraining offers improvements, with an average accuracy of 43.85%. Surprisingly, training on poorly translated data by far outperforms all other methods with an accuracy of 49.12%.
AB - Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, an extension of XNLI (Conneau et al., 2018) to 10 Indigenous languages of the Americas. We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches. Additionally, we explore model adaptation via continued pretraining and provide an analysis of the dataset by considering hypothesis-only models. We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.48%. Continued pretraining offers improvements, with an average accuracy of 43.85%. Surprisingly, training on poorly translated data by far outperforms all other methods with an accuracy of 49.12%.
UR - http://www.scopus.com/inward/record.url?scp=85139890255&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85139890255
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 6279
EP - 6299
BT - ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
A2 - Muresan, Smaranda
A2 - Nakov, Preslav
A2 - Villavicencio, Aline
PB - Association for Computational Linguistics (ACL)
Y2 - 22 May 2022 through 27 May 2022
ER -