Ship-lemmatagger: Building an nlp toolkit for a peruvian native language

José Pereira-Noriega, Rodolfo Mercado-Gonzales, Andrés Melgar, Marco Sobrevilla-Cabezudo, Arturo Oncevay-Marcos

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

5 Citas (Scopus)

Resumen

Natural Language Processing deals with the understanding and generation of texts through computer programs. There are many different functionalities used in this area, but among them there are some functions that are the support of the remaining ones. These methods are related to the core processing of the morphology of the language (such as lemmatization) and automatic identification of the part-of-speech tag. Thereby, this paper describes the implementation of a basic NLP toolkit for a new language, focusing in the features mentioned before, and testing them in an own corpus built for the occasion. The obtained results exceeded the expected results and could be used for more complex tasks such as machine translation.

Idioma originalInglés
Título de la publicación alojadaText, Speech, and Dialogue - 20th International Conference, TSD 2017, Proceedings
EditoresKamil Ekstein, Vaclav Matousek
EditorialSpringer Verlag
Páginas473-481
Número de páginas9
ISBN (versión impresa)9783319642055
DOI
EstadoPublicada - 2017
Evento20th International Conference on Text, Speech and Dialogue, TSD 2017 - Prague, República Checa
Duración: 27 ago. 201731 ago. 2017

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen10415 LNAI
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia20th International Conference on Text, Speech and Dialogue, TSD 2017
País/TerritorioRepública Checa
CiudadPrague
Período27/08/1731/08/17

Huella

Profundice en los temas de investigación de 'Ship-lemmatagger: Building an nlp toolkit for a peruvian native language'. En conjunto forman una huella única.

Citar esto