A classification model for Portuguese documents in the juridical domain

Luis Pinto, Andres Melgar

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

The attorney's office in Brazil, receive daily a lot of notifications. These notifications must be manually analyzed by procurators to determine what kind of document should they prepare to respond. This situation causes in many cases notifications are not answered in time causing these prescribed. All this has motivated the development of this work whose main objective is the development of a computational model to understand the meaning of each notification and indicate what kind of response should be prepared for every situation. For the construction of this model, machine-learning algorithms are used. The problem is modeled as one of classification using free text documents. The texts were extracted from notification documents, which were written in Portuguese. The method to assess the performance of the algorithms was the area under the curve. During the experiment, four algorithms were evaluated, including k-Nearest Neighbor, Support Vector Machine, Naive Bayes and Complement Naive Bayes. The algorithms were trained using a collection of Portuguese documents in the juridical domain, which includes 5471 documents divided into 8 categories. A 25-fold cross validation method was used to measure the unbiased estimate of these prediction models. This paper is a comparative study of machine learning algorithms for the problem of categorization of notifications. As a result of this study, an algorithm model was constructed in order to classify the documents in the corresponding class. The area under the curve value of Support Vector Machine, k-Nearest Neighbor, Naive Bayes and Complement Naive Bayes was 0.846, 0.831, 0.815 and 0.712 respectively. Our study shows that out of these four classification models Support Vector Machine predicts with highest area under the curve value.

Idioma originalInglés
Título de la publicación alojadaProceedings of the 11th Iberian Conference on Information Systems and Technologies, CISTI 2016
EditoresAlvaro Rocha, Luis Paulo Reis, Manuel Perez Cota, Ramiro Goncalves, Octavio Santana Suarez
EditorialIEEE Computer Society
ISBN (versión digital)9789899843462
DOI
EstadoPublicada - 25 jul. 2016
Publicado de forma externa
Evento11th Iberian Conference on Information Systems and Technologies, CISTI 2016 - Gran Canaria, Espana
Duración: 15 jun. 201618 jun. 2016

Serie de la publicación

NombreIberian Conference on Information Systems and Technologies, CISTI
Volumen2016-July
ISSN (versión impresa)2166-0727
ISSN (versión digital)2166-0735

Conferencia

Conferencia11th Iberian Conference on Information Systems and Technologies, CISTI 2016
País/TerritorioEspana
CiudadGran Canaria
Período15/06/1618/06/16

Huella

Profundice en los temas de investigación de 'A classification model for Portuguese documents in the juridical domain'. En conjunto forman una huella única.

Citar esto