TY - GEN
T1 - A classification model for Portuguese documents in the juridical domain
AU - Pinto, Luis
AU - Melgar, Andres
N1 - Publisher Copyright:
© 2016 AISTI.
PY - 2016/7/25
Y1 - 2016/7/25
N2 - The attorney's office in Brazil, receive daily a lot of notifications. These notifications must be manually analyzed by procurators to determine what kind of document should they prepare to respond. This situation causes in many cases notifications are not answered in time causing these prescribed. All this has motivated the development of this work whose main objective is the development of a computational model to understand the meaning of each notification and indicate what kind of response should be prepared for every situation. For the construction of this model, machine-learning algorithms are used. The problem is modeled as one of classification using free text documents. The texts were extracted from notification documents, which were written in Portuguese. The method to assess the performance of the algorithms was the area under the curve. During the experiment, four algorithms were evaluated, including k-Nearest Neighbor, Support Vector Machine, Naive Bayes and Complement Naive Bayes. The algorithms were trained using a collection of Portuguese documents in the juridical domain, which includes 5471 documents divided into 8 categories. A 25-fold cross validation method was used to measure the unbiased estimate of these prediction models. This paper is a comparative study of machine learning algorithms for the problem of categorization of notifications. As a result of this study, an algorithm model was constructed in order to classify the documents in the corresponding class. The area under the curve value of Support Vector Machine, k-Nearest Neighbor, Naive Bayes and Complement Naive Bayes was 0.846, 0.831, 0.815 and 0.712 respectively. Our study shows that out of these four classification models Support Vector Machine predicts with highest area under the curve value.
AB - The attorney's office in Brazil, receive daily a lot of notifications. These notifications must be manually analyzed by procurators to determine what kind of document should they prepare to respond. This situation causes in many cases notifications are not answered in time causing these prescribed. All this has motivated the development of this work whose main objective is the development of a computational model to understand the meaning of each notification and indicate what kind of response should be prepared for every situation. For the construction of this model, machine-learning algorithms are used. The problem is modeled as one of classification using free text documents. The texts were extracted from notification documents, which were written in Portuguese. The method to assess the performance of the algorithms was the area under the curve. During the experiment, four algorithms were evaluated, including k-Nearest Neighbor, Support Vector Machine, Naive Bayes and Complement Naive Bayes. The algorithms were trained using a collection of Portuguese documents in the juridical domain, which includes 5471 documents divided into 8 categories. A 25-fold cross validation method was used to measure the unbiased estimate of these prediction models. This paper is a comparative study of machine learning algorithms for the problem of categorization of notifications. As a result of this study, an algorithm model was constructed in order to classify the documents in the corresponding class. The area under the curve value of Support Vector Machine, k-Nearest Neighbor, Naive Bayes and Complement Naive Bayes was 0.846, 0.831, 0.815 and 0.712 respectively. Our study shows that out of these four classification models Support Vector Machine predicts with highest area under the curve value.
KW - classification model
KW - k-nearest
KW - naïve bayes
KW - support vector machine
KW - text categorization Portuguese documents
UR - http://www.scopus.com/inward/record.url?scp=84982151614&partnerID=8YFLogxK
U2 - 10.1109/CISTI.2016.7521594
DO - 10.1109/CISTI.2016.7521594
M3 - Conference contribution
AN - SCOPUS:84982151614
T3 - Iberian Conference on Information Systems and Technologies, CISTI
BT - Proceedings of the 11th Iberian Conference on Information Systems and Technologies, CISTI 2016
A2 - Rocha, Alvaro
A2 - Reis, Luis Paulo
A2 - Cota, Manuel Perez
A2 - Goncalves, Ramiro
A2 - Suarez, Octavio Santana
PB - IEEE Computer Society
T2 - 11th Iberian Conference on Information Systems and Technologies, CISTI 2016
Y2 - 15 June 2016 through 18 June 2016
ER -