CLD2: Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages

Roberto Zariquiey, Arturo Oncevay, Javier Vera

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Language revitalisation should not be understood as a direct outcome of language documentation, which is mainly focused on the creation of language repositories. Natural language processing (NLP) offers the potential to complement and exploit these repositories through the development of language technologies that may contribute to improving the vitality status of endangered languages. In this paper, we discuss the current state of the interaction between language documentation and computational linguistics, present a diagnosis of how the outputs of recent documentation projects for endangered languages are under-utilised for the NLP community, and discuss how the situation could change from both the documentary linguistics and NLP perspectives. All this is introduced as a bridging paradigm dubbed as Computational Language Documentation and Development (CLD2). CLD2 calls for (1) the inclusion of NLP-friendly annotated data as a deliverable of future language documentation projects; and (2) the exploitation of language documentation databases by the NLP community to promote the computerization of endangered languages, as one way to contribute to their revitalization.

Idioma originalInglés
Título de la publicación alojadaCOMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop
EditoresSarah Moeller, Antonios Anastasopoulos, Antti Arppe, Aditi Chaudhary, Atticus Harrigan, Josh Holden, Jordan Lachler, Alexis Palmer, Shruti Rijhwani, Lane Schwartz
EditorialAssociation for Computational Linguistics (ACL)
Páginas20-30
Número de páginas11
ISBN (versión digital)9781955917308
EstadoPublicada - 2022
Publicado de forma externa
Evento5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, COMPUTEL 2022 - Dublin, Irlanda
Duración: 26 may. 202227 may. 2022

Serie de la publicación

NombreCOMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop

Conferencia

Conferencia5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, COMPUTEL 2022
País/TerritorioIrlanda
CiudadDublin
Período26/05/2227/05/22

Huella

Profundice en los temas de investigación de 'CLD2: Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages'. En conjunto forman una huella única.

Citar esto