TY - GEN
T1 - CLD2
T2 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, COMPUTEL 2022
AU - Zariquiey, Roberto
AU - Oncevay, Arturo
AU - Vera, Javier
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Language revitalisation should not be understood as a direct outcome of language documentation, which is mainly focused on the creation of language repositories. Natural language processing (NLP) offers the potential to complement and exploit these repositories through the development of language technologies that may contribute to improving the vitality status of endangered languages. In this paper, we discuss the current state of the interaction between language documentation and computational linguistics, present a diagnosis of how the outputs of recent documentation projects for endangered languages are under-utilised for the NLP community, and discuss how the situation could change from both the documentary linguistics and NLP perspectives. All this is introduced as a bridging paradigm dubbed as Computational Language Documentation and Development (CLD2). CLD2 calls for (1) the inclusion of NLP-friendly annotated data as a deliverable of future language documentation projects; and (2) the exploitation of language documentation databases by the NLP community to promote the computerization of endangered languages, as one way to contribute to their revitalization.
AB - Language revitalisation should not be understood as a direct outcome of language documentation, which is mainly focused on the creation of language repositories. Natural language processing (NLP) offers the potential to complement and exploit these repositories through the development of language technologies that may contribute to improving the vitality status of endangered languages. In this paper, we discuss the current state of the interaction between language documentation and computational linguistics, present a diagnosis of how the outputs of recent documentation projects for endangered languages are under-utilised for the NLP community, and discuss how the situation could change from both the documentary linguistics and NLP perspectives. All this is introduced as a bridging paradigm dubbed as Computational Language Documentation and Development (CLD2). CLD2 calls for (1) the inclusion of NLP-friendly annotated data as a deliverable of future language documentation projects; and (2) the exploitation of language documentation databases by the NLP community to promote the computerization of endangered languages, as one way to contribute to their revitalization.
UR - http://www.scopus.com/inward/record.url?scp=85137114624&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85137114624
T3 - COMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop
SP - 20
EP - 30
BT - COMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop
A2 - Moeller, Sarah
A2 - Anastasopoulos, Antonios
A2 - Arppe, Antti
A2 - Chaudhary, Aditi
A2 - Harrigan, Atticus
A2 - Holden, Josh
A2 - Lachler, Jordan
A2 - Palmer, Alexis
A2 - Rijhwani, Shruti
A2 - Schwartz, Lane
PB - Association for Computational Linguistics (ACL)
Y2 - 26 May 2022 through 27 May 2022
ER -