Ship-lemmatagger: Building an nlp toolkit for a peruvian native language

José Pereira-Noriega, Rodolfo Mercado-Gonzales, Andrés Melgar, Marco Sobrevilla-Cabezudo, Arturo Oncevay-Marcos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Natural Language Processing deals with the understanding and generation of texts through computer programs. There are many different functionalities used in this area, but among them there are some functions that are the support of the remaining ones. These methods are related to the core processing of the morphology of the language (such as lemmatization) and automatic identification of the part-of-speech tag. Thereby, this paper describes the implementation of a basic NLP toolkit for a new language, focusing in the features mentioned before, and testing them in an own corpus built for the occasion. The obtained results exceeded the expected results and could be used for more complex tasks such as machine translation.

Original languageEnglish
Title of host publicationText, Speech, and Dialogue - 20th International Conference, TSD 2017, Proceedings
EditorsKamil Ekstein, Vaclav Matousek
PublisherSpringer Verlag
Pages473-481
Number of pages9
ISBN (Print)9783319642055
DOIs
StatePublished - 2017
Event20th International Conference on Text, Speech and Dialogue, TSD 2017 - Prague, Czech Republic
Duration: 27 Aug 201731 Aug 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10415 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Text, Speech and Dialogue, TSD 2017
Country/TerritoryCzech Republic
CityPrague
Period27/08/1731/08/17

Keywords

  • Lemmatization
  • Low resource language
  • Part-of-speech tagging
  • Shipibo-konibo

Fingerprint

Dive into the research topics of 'Ship-lemmatagger: Building an nlp toolkit for a peruvian native language'. Together they form a unique fingerprint.

Cite this