Peru is Multilingual, Its Machine Translation Should Be Too?

Arturo Oncevay

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

5 Citas (Scopus)

Resumen

Peru is a multilingual country with a long history of contact between the indigenous languages and Spanish. Taking advantage of this context for machine translation is possible with multilingual approaches for learning both unsupervised subword segmentation and neural machine translation models. The study proposes the first multilingual translation models for four languages spoken in Peru: Aymara, Ashaninka, Quechua and Shipibo-Konibo, providing both many-to-Spanish and Spanish-to-many models and outperforming pairwise baselines in most of them. The task exploited a large English-Spanish dataset for pretraining, monolingual texts with tagged back-translation, and parallel corpora aligned with English. Finally, by fine-tuning the best models, we also assessed the out-of-domain capabilities in two evaluation datasets for Quechua and a new one for Shipibo-Konibo1.

Idioma originalInglés
Título de la publicación alojadaProceedings of the 1st Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2021
EditoresManuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, Katharina Kann
EditorialAssociation for Computational Linguistics (ACL)
Páginas194-201
Número de páginas8
ISBN (versión digital)9781954085442
EstadoPublicada - 2021
Publicado de forma externa
Evento1st Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2021 - Virtual, Online
Duración: 11 jun. 2021 → …

Serie de la publicación

NombreProceedings of the 1st Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2021

Conferencia

Conferencia1st Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2021
CiudadVirtual, Online
Período11/06/21 → …

Huella

Profundice en los temas de investigación de 'Peru is Multilingual, Its Machine Translation Should Be Too?'. En conjunto forman una huella única.

Citar esto