TY - GEN
T1 - Documents Retrieval for Qualitative Research
T2 - 2018 IEEE Latin American Conference on Computational Intelligence, LA-CCI 2018
AU - Alatrista-Salas, Hugo
AU - Hidalgo-Leon, Pilar
AU - Nunez-Del-Prado, Miguel
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/23
Y1 - 2019/1/23
N2 - Gender discrimination is an act of exclusion or differential treatment towards a person due to its sex. This phenomenon has been studied in qualitative research by seeking to analyze and to describe the reality and context of discrimination. Qualitative researchers use a collection of documents such as surveys, interviews among another source. These large full textual documents tend to be unstructured from a Data Science point of view. These data are often complex and tend to show similar information between documents. Nevertheless, the process of selecting relevant information is manual, generating difficulties in categorizing and analyzing relevant piece of information, such as victim's surveys. The main reason in this processing is the use of tools to simplify the task of information selection and to perform it efficiently. This article proposes two methods based on the TF-IDF measure to search documents in a corpus. Our findings show that other methods such as, LSA (Latent Semantics Analysis) and LDA (Latent Dirichlet Allocation) consume a lot of memory, and have a low effectiveness extracting meaningful words than relying on TD-IDF only. The information processed in this case is about testimonies of gender discrimination in university students in Peru.
AB - Gender discrimination is an act of exclusion or differential treatment towards a person due to its sex. This phenomenon has been studied in qualitative research by seeking to analyze and to describe the reality and context of discrimination. Qualitative researchers use a collection of documents such as surveys, interviews among another source. These large full textual documents tend to be unstructured from a Data Science point of view. These data are often complex and tend to show similar information between documents. Nevertheless, the process of selecting relevant information is manual, generating difficulties in categorizing and analyzing relevant piece of information, such as victim's surveys. The main reason in this processing is the use of tools to simplify the task of information selection and to perform it efficiently. This article proposes two methods based on the TF-IDF measure to search documents in a corpus. Our findings show that other methods such as, LSA (Latent Semantics Analysis) and LDA (Latent Dirichlet Allocation) consume a lot of memory, and have a low effectiveness extracting meaningful words than relying on TD-IDF only. The information processed in this case is about testimonies of gender discrimination in university students in Peru.
KW - Data mining
KW - Discrimination
KW - Document retrieval
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85060650571&partnerID=8YFLogxK
U2 - 10.1109/LA-CCI.2018.8625211
DO - 10.1109/LA-CCI.2018.8625211
M3 - Conference contribution
AN - SCOPUS:85060650571
T3 - 2018 IEEE Latin American Conference on Computational Intelligence, LA-CCI 2018
BT - 2018 IEEE Latin American Conference on Computational Intelligence, LA-CCI 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 November 2018 through 9 November 2018
ER -