WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language

Diego Maguiño-Valencia, Arturo Oncevay-Marcos, Marco A. Sobrevilla Cabezudo

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

2 Citas (Scopus)

Resumen

WordNet-like resources are lexical databases with highly relevance information and data which could be exploited in more complex computational linguistics research and applications. The building process requires manual and automatic tasks, that could be more arduous if the language is a minority one with fewer digital resources. This study focuses in the construction of an initial WordNet database for a low-resourced and indigenous language in Peru: Shipibo-Konibo (shp). First, the stages of development from a scarce scenario (a bilingual dictionary shp-es) are described. Then, it is proposed a synset alignment method by comparing the definition glosses in the dictionary (written in Spanish) with the content of a Spanish WordNet. In this sense, word2vec similarity was the chosen metric for the proximity measure. Finally, an evaluation process is performed for the synsets, using a manually annotated Gold Standard in Shipibo-Konibo. The obtained results are promising, and this resource is expected to serve well in further applications, such as word sense disambiguation and even machine translation in the shp-es language pair.

Idioma originalInglés
Título de la publicación alojadaLREC 2018 - 11th International Conference on Language Resources and Evaluation
EditoresHitoshi Isahara, Bente Maegaard, Stelios Piperidis, Christopher Cieri, Thierry Declerck, Koiti Hasida, Helene Mazo, Khalid Choukri, Sara Goggi, Joseph Mariani, Asuncion Moreno, Nicoletta Calzolari, Jan Odijk, Takenobu Tokunaga
EditorialEuropean Language Resources Association (ELRA)
Páginas4403-4407
Número de páginas5
ISBN (versión digital)9791095546009
EstadoPublicada - 2019
Evento11th International Conference on Language Resources and Evaluation, LREC 2018 - Miyazaki, Japón
Duración: 7 may. 201812 may. 2018

Serie de la publicación

NombreLREC 2018 - 11th International Conference on Language Resources and Evaluation

Conferencia

Conferencia11th International Conference on Language Resources and Evaluation, LREC 2018
País/TerritorioJapón
CiudadMiyazaki
Período7/05/1812/05/18

Huella

Profundice en los temas de investigación de 'WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language'. En conjunto forman una huella única.

Citar esto