TY - GEN
T1 - WordNet-SHP
T2 - 11th International Conference on Language Resources and Evaluation, LREC 2018
AU - Maguiño-Valencia, Diego
AU - Oncevay-Marcos, Arturo
AU - Sobrevilla Cabezudo, Marco A.
N1 - Publisher Copyright:
© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved.
PY - 2019
Y1 - 2019
N2 - WordNet-like resources are lexical databases with highly relevance information and data which could be exploited in more complex computational linguistics research and applications. The building process requires manual and automatic tasks, that could be more arduous if the language is a minority one with fewer digital resources. This study focuses in the construction of an initial WordNet database for a low-resourced and indigenous language in Peru: Shipibo-Konibo (shp). First, the stages of development from a scarce scenario (a bilingual dictionary shp-es) are described. Then, it is proposed a synset alignment method by comparing the definition glosses in the dictionary (written in Spanish) with the content of a Spanish WordNet. In this sense, word2vec similarity was the chosen metric for the proximity measure. Finally, an evaluation process is performed for the synsets, using a manually annotated Gold Standard in Shipibo-Konibo. The obtained results are promising, and this resource is expected to serve well in further applications, such as word sense disambiguation and even machine translation in the shp-es language pair.
AB - WordNet-like resources are lexical databases with highly relevance information and data which could be exploited in more complex computational linguistics research and applications. The building process requires manual and automatic tasks, that could be more arduous if the language is a minority one with fewer digital resources. This study focuses in the construction of an initial WordNet database for a low-resourced and indigenous language in Peru: Shipibo-Konibo (shp). First, the stages of development from a scarce scenario (a bilingual dictionary shp-es) are described. Then, it is proposed a synset alignment method by comparing the definition glosses in the dictionary (written in Spanish) with the content of a Spanish WordNet. In this sense, word2vec similarity was the chosen metric for the proximity measure. Finally, an evaluation process is performed for the synsets, using a manually annotated Gold Standard in Shipibo-Konibo. The obtained results are promising, and this resource is expected to serve well in further applications, such as word sense disambiguation and even machine translation in the shp-es language pair.
KW - Lexical database
KW - Minority language
KW - WordNet
UR - http://www.scopus.com/inward/record.url?scp=85059915834&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85059915834
T3 - LREC 2018 - 11th International Conference on Language Resources and Evaluation
SP - 4403
EP - 4407
BT - LREC 2018 - 11th International Conference on Language Resources and Evaluation
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Piperidis, Stelios
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Hasida, Koiti
A2 - Mazo, Helene
A2 - Choukri, Khalid
A2 - Goggi, Sara
A2 - Mariani, Joseph
A2 - Moreno, Asuncion
A2 - Calzolari, Nicoletta
A2 - Odijk, Jan
A2 - Tokunaga, Takenobu
PB - European Language Resources Association (ELRA)
Y2 - 7 May 2018 through 12 May 2018
ER -